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roENTmCATION OF STREPTOCOCCUS PNEUMONIAE SEROTYPES 

FIELD OF THE INVENTION 

The present invention relates to molecular methods of serotyping Streptococcus 
5 pneumoniae, as well as polynucleotides useful in such methods. 

BACKGROUND OF THE INVENTION 

Streptococcus pneumoniae is a leading cause of morbidity and mortality causing 
invasive disease such as meningitis and pneumonia as well as more localised disease 

10 such as acute otitis media and sinusitis. Polysaccharide and protein-conjugate 
pneumococcal vaccines have the potential to prevent a significant proportion of cases. 
Effective protein-conjugate vaccines are particularly important because of the dramatic 
increase in prevalence and international dissemination of antibiotic resistant S, 
pneumoniae serotypes that commonly cause invasive disease in children (Hausdorff et 

15 eL, 2001; Huebner, et al., 2000). However these vaccines protect against only the 
relatively small minority (Dunne et al., 2001; Hausdorff et el., 2001) of pneumococcal 
serotypes that most commonly cause disease. There is theoretical and limited empirical 
evidence that widespread use of these vaccines could lead to substitution of *Vaccine" 
serotypes with other nonvacdne serotypes, against which the vaccines do not provide 

20 protection. Continued surveillance will be critical to monitor vaccine efficacy and 
changes in incidence and distribution of colonising and invasive serotypes (Hausdorff 
et eL, 2001; Rubins et al., 1999). Any increase in disease caused by previously 
uncommon nonvaccine serotypes could necessitate a change in vaccine composition 
(Lipsitch, 2001). 

25 5. pneumoniae comprises at least 90 serotypes, distinguished by capsular 

polysaccharide antigens. Pneumococcal serotype/group identification is currently 
performed, using large panels of expensive antisera, by various methods, including 
capsular swelling (Quellung) reaction - the traditional "gold standard" - latex 
agglutination and coagglutination (Aral et al., 2001; Lalitha et al., 1999). Cross- 

30 reactions between serotypes and discrepancies between methods can occur and some 
strains are nonserotypable (Henrichsen, 1999). 

The capsular polysaccharide synthesis {cps) gene clusters for at least 16 
pneumococcal serotypes have been sequenced and serotype-specific genes identified 
(Jiang et al., 2001; van Sehn et al., 2002). The cps gene cluster contains genes 

35 responsible for synthesis of the serotype-specific polysaccharide including - except in 
serotype 3 - wzy (polysaccharide polymerase gene) and wzx ^olysccharide flippase 
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gene). At the 5'-ead of the cps gene cluster are fovir relatively conserved open reading 
frames - cpsA (yvzgycpsB (wzh)-cpsC (yvzcl)-cpsD (wze). Sequence differences in this 
region were used to classify 11 S, pneumoniae serotypes into two classes and, in the 
region between the 3'-end of cpsA and the 5'-end of cpsB, there were sites of 
heterogeneity between and within serotypes (Jiang et al., 2001; Lawrence et al., 2000). 
.S". pneutnoniae is characterised by high frequency recombination within the cps gene 
cluster, leading to serotype "switching" among isolates within genetic lineages defined 
by relationships between their more conserved housekeeping genes (Cofifey et al., 
1998; Jiang etal., 2001). 

The relatively low percentage of polymorphisms between strains which is linked 
to actual serotype, and the large number of different serotypes, has made the 
development of assays which can be used for typing a significant portion of S. 
pneumoniae strains difficult Accordingly, there is a need for fiarther methods which 
can be used to idraitify different Streptococcus pneumoniae serotypes. 



SUMMARY OF THE PJVENTIQN 

Through the complex analysis of a large number of polymorphisms which exist 

between at least 132 molecular capsular sequence types of Streptococcus pneumoniae 

the present mventors have devised methods which can be used to distinguish between a 
20 majority of different S. pneumoniae serotypes. In particular, prior art methods of 

nucleic acid based typing techniques could serotype only about 20 serotypes of S. 

pneumoniae. In contrast, the methods of the mvention can be used to serotype most of 

the about 90 serotypes of S. pneumoniae. The methods of the invention can also be 

used to subtype some serotypes. 
25 Thus, in a first aspect, the present invention provides a method of distinguishing 

between at least 25 different serotypes of Streptococcus pneumoniae in a sample, the 

method comprising, 

i) analysing at least a portion of the nucleotide sequence between the 3' end of 
the cpsA gene and the 5' end of the cpsB gene, and/or 
30 ii) analysing at least a portion of the wzy and/or wzx gene(s). 

Preferably, the method can be used to type at least 40, more preferably at least 
50, more preferably at least 70, more preferably at least 90, more preferably at least 
100, even more preferably at least about 132 different molecular capsular sequence 
types of pneumoniae. 
35 The present inventors are the first to provide suitable nucleic add based 

techniques for typmg a large number of Streptococcus pneumoniae serotypes. 
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Accordingly, in another aspect the present invention provides a method of deteimining 
the serotype of Streptococcus pneumoniae in a. sample, the method comprising, 

i) analysing at least a portion of the nucleotide sequaice between the 3' end of 
the cpsA gene and the 5' end of the cpsB gene, and/or 
5 ii) analysing at least a portion of the wzy and/or wzx gene(s), 

wherein the serotype is selected from the group consisting of: 2, 7A, 7B, 7C, 9A, 9L, 
lOF, lOA, lOB, IOC, IIF, llA, IIB, IIC, IID, 12F, 12A, 12B, 13, 15F, 15A, 15B, 
15C, 16A, 17F, 17A, 18F, 18A, 18B, 21, 22F, 22A, 24F, 24A, 24B, 25F, 25A, 27, 28F, 
28A, 31, 32F. 32A, 33F, 33A, 33B, 33C, 33D, 34, 35A, 35B, 35C, 36, 37, 38, 39, 40, 
10 41F, 41 A, 42, 43, 44, 45, 46, 47, 47A and 48. 

The present invaitors have surprisingly found that at least about 102 molecular 
capsular sequence types of S. pneumoniae can be directly serotyped by analysing the 3' 
end of the cpsA gene and the 5' end of the cpsB gene of the S. pneumoniae genome. 

Thus, in another aspect the present invention provides a method of determining 
15 the serotype of Streptococcus pneumoniae in a sample, the method comprising 
analysing at least a portion of the nucleotide sequence between the 3' end of the cpsA 
gene and the 5' end of the cpsB gene. 

In a preferred embodiment, the portion of the nucleotide sequence between the 
3' end of the cpsA geae and the 5' end of the cpsB gene which is analysed is any 
20 nucleotide which is polymoiphic between at least some of the S. pneumoniae serotypes 
referred to in Figure 2. 

hi a particularly preferred embodiment, the method comprises amplifying at 
least a portion of the nucleotide sequence between the 3' end of flie cpsA gene and the 
5' end of the cpsB gene, and sequencing the amplification product. More preferably, 
25 the entire ^>proximate 800 bp region as provided in Figure 2 is amplified and 
sequenced. 

In the case of sequencing to identify liie serotype, the sequencing primers are 
selected such tiiat they hybridise specifically to a region within or near to a region 
within which a polymorphism is present. The primers need not be specific to particular 
30 serotj^es since it is tiie actual sequence information obtained during the sequencing 
process which is used to determine the S. pneumoniae serotype. Thus the primers may 
hybridise specifically to genomic DNA firom all S. pneumoniae serotypes (or at least 
those serotypes referred to in Figure 2), or to genomic DNA from some, but not all, S. 
pneumoniae serotypes. 

When a portion of the nucleotide sequence between the 3' end of the cpsA gene 
and the 5' end of the cpsB gene is ampUfied, it is preferable that the ampUfication is 
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performed using primer pairs comprising a sequence selected from the group consisting 
of: 

1) GGCATT(/C)TATGQAGTTGATTCG(/A)TCCATT(/C)CACAC(C/T)TTAG 
(SEQ ID NO:68) and 

5 GC(/T)TCAATG(/A)TGG(/A)GCAATG(/T)ACTGGA(/C:OGTA(/G)ATTCCCA(/G)A 
CATC (SEQ ID NO:73) , 

2) GGCATT(/C)TATGGAGTTGATTCG(/A)TCCATT(/C)CACACC(/T) 
TTAG (SEQ ID NO:68) and 
CCATCAC(yT)ATAGAGGTTAC(/A)TG(/A)TCTGGCATT(/C)GC (SEQ ID NO:71), 

10 3) GAAAGTGGG(/A/T)GGG(/A/T)A(/G)A(/C)T(/G)TAT(/C)AAAGTA(/G) 

AATTCT(/G)CAAGAT(/C)TTA(/G)AAA(/G)G (SEQ ID NO:70) and 

T(/G)CATG(/A)CTA(/G)AAC(/T)TCT(/A)ATC(/T)AAG(/A)GCATAACGACTATC(/ 
T)(SEQIDNO:72),and 

4) primer pairs that amplify the same region, or diagnostic portion thereof, from 
15 the genome of a strain of S. pneumoniae as the primers provided in 1 ) to 3). 

In an alternate embodiment, the nucleotide sequence analysis step comprises 
determining whether a polynucleotide obtained from S. pneumoniae selectively 
hybridises to a polynucleotide probe comprising one or more polymorphic regions of 
the nucleotide sequence between the 3* end of the cpsA gene and the 5' end of the cpsB 
20 gene, wherein such polymorphic regions are shown in Figure 2. More preferably, the 
nucleotide sequence analysis step comprises a plurality of said polynucleotide probes. 
In a particularly preferred embodiment, where hybridisation to a plurality of probes is 
used as a means of analysis, the plurality of polynucleotide probes are present as a 
microarray. 

25 It has been noted that the method of analysing at least a portion of the nucleotide 

sequence between the 3' end of the cpsA gene and the 5' end of the cpsB gene does not 
enable the identification of all known S. pneumoniae serotypes, for example shared 
sequences were noted in the following cases; 6A and 6B; lOA and 17A, lOA and 23F, 
23F and 23A; 15B, 15C, 22F and 22A; 17F, 35B, 35C and 42. Accordingly, in these 

30 instances fiarther analysis will need to be performed to determine the correct serotype. 
To this end, the present inventors have discovered that polymorphisms in the wzy 
and/or wzx genes can also be useful for S. pneumoniae serotyping. 

Accordingly, in a further aspect the present invaition provides a method of 
detOTuining the setofype of Streptococcus pneumoniae in a sample, the method 

35 comprising analysing at least a portion of the wzy and/or wzx gene(s). 
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In a preferred embodiment, the method comprises amplifying at least a portion 
of the wzy and/or wzx gene(s), and determining the length of the ampUfication product. 

In a particularly preferred embodiment, at least a portion of the wzy and/or wzx 
gene(s) is amplified using primer pairs comprising a sequence selected from the group 
5 consisting of: 

1) GTAGGTGTAGTTTTTTCAGGGACTTTAATTTTATGCAGTG (SEQ ID 

NO:74) and 

TCGCTTAACACAATGGCTTTAGAAGGTAGAG (SEQ ID NO:75), 

2) GTTATTTTATTTTTTTTGTCGGCATTGTATTCm (SEQ ID 
10 NO:76) and CAAATTCATCGTTTGTATCCATTTAACTGCATC (SEQ ID NO:77), 

3) CTTATATCTAATTATGTTCCGTCTATATTTATATGGGTTTGCTTTC 
(SEQ ID NO:78) and TTTCTCTTCATTTTCCTGATAATTTTGTACTTCTGAATG 
(SEQ ID NO:79), 

4) ATGCTTTTAAATTTCTTATTCATATCTATTTTTC (SEQ ID NO:80) and 
15 GTAAACAGAGAGCGAGTGATCATTTTAAAACTTTTGG (SEQ ID NO:83), 

5) G(/A)GATTTT(/G)TTTCAACCT(/C)GCAGTAATTTTAACAA(/C)TC(/T) 
G(/A) (SEQ ID NO:81) and 

CCTGAAAACAA(/G)TACT(/C)ACrTTCTGAATTTCA(X/T)GGA(/G)^^ 
(SEQIDNO:82), 

20 6) GTTTTATTGACTTTAAAGATGTTAGTTTCTTCGATTCCAG (SEQ ID 

NO:84) and TTTTTATTACTCTrcrTAAATCATAATGAATCGTACCAATCAAC 
(SEQIDNO:85), 

7) GGATCAATGGCAACTATATTTACCCTACTCTCCACAG (SEQ ID 
NO:86) and GAGTCGAAACCAACCGGAAAAAGCAATTGAG (SEQ ID NO:87), 
25 8) CCTTTGGTTTATTATCCTACTTCCAAAACAGTTTATGC (SEQ ID 

NO:88) and CATATATCTCTTTATCCTGTCAATATTGATTGGCATTTTC (SEQ ID 
NO:89), 

9) GATAITAGCTATACCAACAATTGTTCTTTTCCTGTACTCAGTC (SEQ 
ID NO:91) and GCATTTCTAGTACCGAACCATTGAAACTATCATCTG (SEQ ID 

30 NO:93), 

1 0) GAAATTATAGTCGGAGCTTTCATTTATATTAGTTTACTGGTTCTG 
(SEQ ID NO:90) and CAGAATAAAGAGAGCTGTAATAGGTGCAACTTCATGC 
(SEQIDNO:93), 

11) CTGTAATGTTTCTAATTAGTTCAGTATTTGCACTGGTTAATTC 
35 (SEQ ID NO:94) and 
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CCCGTATATCCATTACTAAGAACAAGGTTGTATATTTCCTTC (SEQ ID 
NO:95), 

12) GTTTCTCATTAGTTCTGTATTTGCCCTTATTAATGTGC (SEQ ID 
NO:96) and CCATGGCTAAGTGCAAGATTATGAATCTCTCTC (SEQ ID NO:97), 
5 13) GTTTCTTATGTTTACCCTCAGCTTATATTGGCACAG (SEQ ID 

NO:98) and GATACCACAAATCTCCGAATTCTCTTAAAATAGATGG (SEQ ID 
NO:99), 

14) TTAAGTAGTTCACAAGTGATAGTGAACTTGGGATTGTC (SEQ ID 
NO: 100) and CACTGAGATTATTTATTAGCTTTATCGGTAAGGTGGATAAG 

10 (SEQIDNO:101), 

1 5) ATTACTTGTAATACTATGTATTCAACTAGTCA(/C)AGGATTTGAT 
GG (SEQ ID NO: 103) and GAACAAATTTCCGTATCAGATTTGCGATTTC (SEQ 
IDNO:104), 

16) CCAATGAAAAGGAAAGTTCAATGTGTTTTGTTTCTGC (SEQ ID 
NO:102) and GGTGCTTCAGCAAAAATCCCCGTATTTCTTATCAG (SEQ ID 
NO:105), 

17) TAGCTGATGTTCCGATAAATTATGGTGGGGTAATAATAG (SEQ ID 
NO:106) and CTGCGACACTGTATATACCTACATTATAACTACTAGACATTTGC 
(SEQIDNO:107), 

18) GCAACTITGGTTCTAAAATTTTAGTCmTTTAATGGTTCC (SEQ ID 
NOrlOB) and TGTTAAACCCCAATATAGAAATTGTATTGAGAATAGCAGC 
(SEQIDNO:109), 

19) CGTTAATAGCTTATGTTCAACTGGTGATTGATTTTGG (SEQ ID 
NO: 110) and TGATAGTTTTAGAAATAATATAAGGAATTGCAACTGCATGC 
(SEQ ID NO: 111), 

20) TTCATGTC(/T)T(/C)TTTTG(/A)TCTAATCTGATTACAATTG(/C) 
TC(/T)A CAT CG(/A) (SEQ ID NO: 113) and 

T(/C)GCATTTG(/T)GATCTGTCACAA(/G)TCAATAAGTTAAAACC (SEQ ID 
NO: 114), 

21) GGTAGGTATTTTAATTGGAGGAAGAGAGTCTTGAATGG (SEQ ID 
NO:112) and ATCTTCCCTTCATAAATTGACATAGGAAAAATAAGAGCC (SEQ 
ID NO: 115), 

22) CAATTCTAACTATGTCCAGTTTTATTTTTCCACTCATCAG (SEQ ID 
NO:116) and GACGTGATAATAATAAGCTGCCATTCCTGTCTAAAACG (SEQ ID 
NO:117), 
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23) CGGCGGTATTAAGTAGAATATTAACACCTGAAGAGTATGGC (SEQ 
ID NO: 11 8) and GGCAATCAGACTCAATAAGTTCATCCGTTTAAAGTTC (SEQ 
IDNO:119), 

24) GGTATTGCCnTrCCTTTGATAACTTCTCCTrATTTATCAC (SEQ ID 
5 NO:120) and TGAACTTGTAACTCGACACCCAAAAATATAAATAAATGAG 

(SEQIDNO:121), 

25) GAATCGGACAATAGCACAGGTACGAACAAG (SEQ ID NO:123) and 
GCCATGTAATCAACTGACCAAGCAGGGTACTC (SEQ ID NO:124), 

26) CAAAGGAACGTTATCAGCAATTGTGTCAAATTTCAG (SEQ ID 
10 NO:122) and AAGATTAGGGCGCACAAAGTTTACTTGTTTTAGC (SEQ ID 

NO: 125), 

27) GTTATTTCrTCAAATCTGCTCATAGTTTTAACCTCATCAC (SEQ ID 
NO: 126) and TATCTTGCGTTTTCATCCCTTACAGTTATTAGGTTCAAAG (SEQ 
ID NO: 127), 

15 28) TTCTTCAAATCTTTTGACAGTCTTGACCTCTTCCTTG (SEQ ID 

NO: 128) and TATCGTGCATTCGAATCTGTTACAGCTAATACATTTAAAC (SEQ 
ID NO:129), 

29) GTCCTGACGCTATCAAATATCATTTTCCCATTAATCAC (SEQ ID 
NO: 130) and CCCACATGTGATCAATAGGAGTGAAAATTCTCTATTC (SEQ ID 

20 NO: 131), 

30) GCTTTGGCTAACnTrTCATCAAAGATTTTAATTT^^ 

(SEQ ID N0133) and 

CCAGAGATAGCTGTAACACCAATTTTATCAATTCCCTTAG (SEQ ID NO:134), 

31) CCTTTGGCTAATTTCTTGGACGATAATGAATTTGTATATG (SEQ ID 
25 NO: 132) and CCACAAACATTAGCAATAAAGAAACCTAACAATCCC (SEQ ID 

NO: 135), 

32) GATCATACTCCCTATCATTACGACTCCCTATGTAACG (SEQ ID 
NO: 137) and CCAAGAAATATCCAAACCTTTTGACACTAAACTTAATCC (SEQ 
ID NO: 13 8), 

30 33) GTTGTTTTAGCTCAAGGAGGGATAATGTTGGCTTCG (SEQ ID 

NO: 136) and GCTGATTTTACAAATAGGAAAATAGAGATTGCACCAAC (SEQ 
ID NO: 139), and 

34) a primer comprising a sequence selected from any one of SEQ ID NO's 144 
to 333, and 
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35) a primer that can be used to amplify the same region, or diagnostic portion 
thereof, from the genome of a strain of S. pneumoniae as a primer provided as any one 
of SEQ ID NO'S 75 to 139 or 144 to 333. 

Guidance regarding the serotypes fliese primer pairs target, and the length of 
5 resulting amplification products, is provided in Tables 2, 3 and 7. 

It has been noted that some of the above primer pairs formed non-serotype 
specific amplicons, for example; PGR targeting serotype 6B also amplified 6A; PGR 
targeting 18C amplified all serotypes in serogroup 18; PGR targeting wzx (but not wzy) 
of serotype 23F, ampUfied three serotype 23A strains; PGR targeting wzx and wzy of 
10 serotypes 33/37 amplified a 33A isolate and that targeting wzx ampUfied a serotype 
33B isolate. Accordingly, in these instances fijrther analysis will need to be performed 
to determine the correct serotype. For instance, traditional serological typing can be 
performed. 

Serotype 3 does not contain wzy and wzx genes. Accordingly, upon obtaining 
15 results using the method of analysing at least a portion of the nucleotide sequence 
between the 3' end of the cpsA gene and the 5' end of the cpsB gene, the presence of 
serotype 3 can be confirmed by analysing the o/y2 (wze)-cap3A-cap3B region. 
Preferably, serotype 3 is identified by amplifying a portion of the orf2 (wze)-cap3A- 
capSB region using primer pairs selected from the group consisting of: 
20 1) GGACAAAAAAAAGTTTGATATTGCGGTTGAGAATAG (SEQ ID 

NO: 140) and GGAGGATGTAAGGAGGGITGAAGATTGAAGTG (SEQ ID 
NO: 141), 

2) CGAAGGTAGTATTGAGTGTGATAGTTTTATGGGATAGAGAG (SEQ 
ID NO: 142) and CTGAGAGGATGAAAATATATAAGCGGCGAAGGAATAAG 

25 (SEQ ID NO: 143), and 

3) primra: pairs that amplify the same region, or diagnostic portion thereof from 
the genome of a strain of S. pneumoniae as the primers provided in 1) or 2). 

During routine analysis of a sample comprising bacteria it will typically be 
desirable to ensure that the sample being analysed actually contains Streptococcus 
30 pneumoniae. Thus, it is preferred that the methods of the present invention include 
detecting any serotype of Streptococcus pneumoniae in the sample. 

Such methods are known in the art and include, but are not limited to, 
amplifying portions of the and/or pneumolysin genes followed by detection of the 
amplification products. 

35 In a preferred embodiment, a portion of the pscuL gene is amplified using 

P"^^ comprising the sequence 
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TACATTACTCGTTCTCTTTCTITCTGCAATCATTCTTG (SEQ ID NO:64) and 
TAGTAGCTGTCGCCTTCTTTACCTTGTTCTGC (SEQ ID NO:65), or primer pairs 
that amplify tiie same region, or diagnostic portion thereof fix)m the genome of a strain 
of S. pneumoniae as SEQ ID NO:64 and SEQ ID NO:65. In another preferred 
5 embodiment, a portion of the pneumolysin gaie is amplified using primers comprising 
the sequence AGAATAATCCCACTCTTCTTGCGGTTGA (SEQ ID NO:66) and 
CATGCTGTGAGCCGTTATTTTTTCATACTG (SEQ ID NO:67) or primer pairs that 
amplify the same region, or diagnostic portion thereof, fix>m the genome of a strain of 
S. pneumoniae as SEQ ID NO:66 and SEQ ID NO:67. 

10 The present inventors have observed a strong correlation between the molecular 

capsular sequence typing techniques of the invention and the actual serotype of a strain 
as determined by traditional antibody based serological typing. However, the typing 
methods of the invention may be assisted by further serotyping the S. pneumoniae 
strain. For instance, to ensure recombination events have not occurred, upon typing 

15 with the methods of the invention the serotype can be confirmed by serologically 
typing for the strain suggested by the methods of the invention. Furthermore, the 
inventors have noted that a few serotypes are.difficvdt to resolve using the methods of 
the invention, for example; 6A and 6B; 15B and 15C; 22F and 22A; and 35C and 42. 
Upon identification of any of these serotypes by the molecular techniques of the 

20 invention the serotype can be unequivocally typed using traditional serological 
me&ods. 

In another aspect, the present invention provides an isolated polynucleotide 
comprising a sequence of nucleotides selected firom those provided as SEQ ID NO's 2 
to 63, or a Augment thereof which is at least 10 nucleotides in length, with the proviso 
25 that die polynucleotide does not comprise the entire wzy and/or wzx gene(s) of a S. 
pneumoniae serotype selected from the group consisting of: 1, 2, 4, 6A, 6B, 8, 9V, 14, 
18C, 19F, 19A, 19B, 23F, 33F and 37, or the entire wzx gene of S. pneumonia^ 
serotype 19C. 

In a furttier aspect, the present invention provides an isolated polynucleotide 
30 comprising a sequence of nucleotides selected from the group consisting of: 1- 
AF532632, 10A-AF532633, 10A-AF532634, 10B-AY508586, 10F-AF532635, lOF- 
AF532636, 10F-AY508587, 1 1 A-AF532637, 11A-AF532638, 11B-AF532639, IIC- 
AY508588, 11C-AY508589, 12A-AY508590, 12A-AY508591, 12F-AF532640, 12F- 
AF532641, 13-AF532642, 14-AF532643, 14-AF532644, 14-AF532645, ' 15A- 
35 AF532646, 15A-AF532647, 15B-AF532648, 15B-AF532649, 15B-AF532650, 15C- 
AF532651, 15C-AF532652, 15C-AY330714, 15C-AY330715, 15C-AY508592, 15C- 
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AY508593, 15F-AY508594, 15F-AY508595, 16A-AY508596, 16F-AF532653, 16F- 
AF532654, 17A-AF532655. 17A.-AY508597, 17F-AF532656, 17F-AF532657,' 18A- 
AF532658, 18A-AF532659, 18B-AF532660. 18C-AF532661, 18F-AF532662, 18F- 
AY330716, 18F-AY508598, 19A-AF532663, 19A-AF532664. 19B-AY508599, 19C- 
5 AY508600, 19C-AY508601, 19F-AF532665, 19F-AF532666, 19F-AF532667.' 19F- 
AF532668, 2.AF532669, 20-AF532670, 21.AF532671. 21-AY508602, ' 22A- 
AF532672, 22F.AF532673. 23A-AF532674, 23A-AF532675, 23B-AF532676, 23B- 
AY330717, 23F-AF532677, 23F-AF532678, 23 F-AF532679. 24A-AY508603, 24B- 
AY508604, 24F-AY508605, 24F-AY508606. 24F-AY508607, 25F-AF532711. 27- 
10 AY508608, 28A-AY508609, 28F-AY508610, 28F-AY508611. 29-AF532680,' 29- 
AY330718, 3-AF532681. 3-AF532682, 3.AF532683, 31-AF532684, 32A-AY508612, 
32A-AY508613, 32F-AY508614, 33A-AF532685, 33B-AF532686, 33B.AY508615,' 
33C-AY508616, 33F-AF532687, 33F-AF532688, 33F.AF532689, 34-AF532690, 35A- 
AY508617. 35B-AF532691, 35C-AY508618, 35F-AF532692, 36-AY508619, 37- 

15 AF532713, 38-AF532712. 39-AY508620, 39-AY508621, 4-AF532693. 40-AY5o'8622, 
41A-AY508623, 41F-AY508624, 42-AY508625, 43-AY508626, 45-AY508628, 46- 
AY508629, 47A-AY508630. 47F-AY508631, 48-AY508632, 48-AY508633, 5- 
AF532696. 5-AF532697, 5.AY508634, 6A-AF532698, 6A-AF532699, 6A-AF532700, 
6A-AF532701, 6A-AF532702, 6A-AY508641, 6B-AF532703, 6B-AF532704, 6B- 

20 AF532705, 7A-AY508635, 7B-AY508636. 7C-AF532706, 7F-AF532707, 8- 
AF532708, 9A-AY508637, 9L-AY508638, 9N-AF532709, 9V-AF532710 and 9V- 
AY508639 as provided in Figure 2, or a fragment thereof which is at least 10 
nucleotides in length, with the proviso the polynucleotide does not comprise the 3' end 
of the cpsA gene to the 5' end of the cpsB gene of a S. pneumoniae serotype selected 

25 from the group consisting of: 1, 2, 3, 4, 6A, 6B. 8, 9V, 14, 18C, 19F, 19A, 23F 33F 
and 37. 

In a preferred embodiment, the polynucleotide of these aspects is at least 15 
nucleotides, more preferably at least 20 nucleotides, more preferably at least 25 
nucleotides, more preferably at least 30 nucleotides, more preferably at least 50 
30 nucleotides in length, and even more preferably at least 1 00 nucleotides in length. 

In a further aspect, the present invention provides an isolated polynucleotide 
consisting essentially of 10 to 50 contiguous nucleotides corresponding to a portion of 
the 3' end of the cpsA S. pneumoniae gene or the 5' end of the cpsB S. pneumoniae 
gene,. 
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In a further aspect, the present invention provides a polynucleotide consisting 
essentially of 10 to 50 contiguous nucleotides corresponding to a portion of the S. 
pneumoniae wzy and/or wzx gene(s). 

Preferably, said polynucleotide of 10 to 50 contiguous nucleotides comprises 
5 one or more nucleotides which differ between different S. pneumoniae serotypes. 

Polynucleotides of 10 to 50 contiguous nucleotides can be used as amplification 
primers, or as probes, for the identification of different S. pneumoniae serotypes. 

Preferably the nucleotides which differ between S. pneumoniae serotypes 
correspond to one or more of positions as shown in Figure 2. 
10 Preferably, the polynucleotide is detectably labelled. The label can be any 

suitable label known in the art including, but not limited to, radionucUdes, enzymes, 
fluorescent, and chemiluminescent labels. 

Also provided is a vector comprising a polynucleotide of the invention. 
Preferably, the vector is an expression vector. Furthermore, provided is a host cell 
15 comprising a vector of the invention. Suitable vectors and host cells would be weU 
known to those skilled in the art. 

In yet another aspect, tiie present invaition provides a composition comprising a 
plurality of polynucleotides according to the invention and an acceptable carrier or 
excipient. Preferably, the carrier or exdpient is water or a suitable buffer. The 
20 composition may be used in methods of typing different S, pneumoniae serotypes. 

In a further aspect the present invention provides a microarray comprising a 
pluraUty of polynucleotides according to the invention. The microarray may be used in 
methods of typing different iS. pnewnoniae serotypes. 

In another aspect, the present invention provides a kit comprising at least one 
25 polynucleotide of the present invention. 

Preferably, the polynucleotide is 10 to 50 nucleotides in length. In one 
embodiment, the kit further comprises reagents necessary for nucleic acid 
amplification. In another embodiment, the polynucleotide is detectably labelled and the 
kit further comprises means for detecting the labelled polynucleotide. 
30 As will be apparent, preferred features and characteristics of one aspect of the 

invention are applicable to many other aspects of the invention. 

Throughout this specification the word "comprise", or variations such as 
"comprises" or "comprising", will be understood to imply the inclusion of a stated 
element, integer or step, or group of elements, integers or steps, but not the exclusion of 
35 any other element, integer or step, or group of elements, integers or steps. 
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The invention is hereinafter described by way of the following non-limiting 
examples and with reference to the accompanying figures. 

BRIEF DESCRIPTION OF TTTF ACCOMPANYING DRAWINGS 

5 Figure 1. The genomic sequence of cpsA (wzg) and cpsB (wzh) genes of serotype 4 of 
S, pneumoniae as published by Jiang et al. (2001) and deposited as GenBank Accession 
Number AF3 16639. The remaining 3' sequence of GenBank Accession Number 
AF3 16639 has not been provided. Nucleotides 1520 to 2965 encode cpsA whilst 
nucleotides 2967 to 3698 encode cpsB. 

10 

Figure 2. Multiple sequence aligmnents for the region between the 3 '-end of cpsA 
(wzg) and the 5'-end of cpsB (wzh) of 132 molecular capsular sequence types of S, 
pneumoniae. The aligmnent numbering start point "1" refer to the position "2470" of 
S, pneumoniae serotype 4 cpsA {wzg) gene (GenBank accession number: AF3 16639) 
15 (Figure 1). 

Figure 3. Phylogenetic tree inferred from sequences in the region between the 3'-end of 
cpsA (wzg) and the 5 '-end of cpsB (wzh) genes for 132 molecular capsular sequence 
types of S. pneumoniae. Most of the tree input sequences are from Figure 2; for 
20 GenBank accession numbers see Tables 1 and 8, 

Figure 4. Phylogenetic tree of wzx genes of 83 S, pneumoniae cps serotypes. The tree 
is generated by the neighbour-joining method based on all nucleotide sites. 

25 Figure 5. Phylogenetic tree of wzy genes of total 83 S. pneumoniae cps serotypes. The 
tree is generated by the neighbour-joining method based on all nucleotide sites. 

Figure 6. Schematic representation of the closely related wzx genes. Each block 
represents wzx genes from one or more S, pneumoniae serotype cps gene cluster. 
30 Similar patterns and shading represent regions with DNA sequence identity > 75% 
among different nucleotide sequences. 

KEY TO THE SEQUENCE LISTING 

SEQ ID NO:l - Genomic sequence of cpsA {wzg) and cpsB {wzh) genes of serotype 4 
35 of iS. pneumoniae (Figure 1 )• 

SEQ ID NO:2 - Partial sequence of strain 00-251-3185 wzx gene. 
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SEQ ID NO:3 - Partial sequence of strain 01-122-0226 wzx gene. 

SEQ ID NO:4 - Partial sequence of strain 01-192-2471 wzx gene. 

SEQ ID NO:5 - Partial sequence of strain MA055100 wzx gene. 

SEQ ID NO:6 - Partial sequence of strain NZSPNOl/329 wzx gene. 
5 SEQ ID NO:7 - Partial sequence of strain 00-256- 1 986 wzx gene. 

SEQ ID NO:8 - Partial sequence of strain NZSPNOl/276 wzx gene. 

SEQ ID NO:9 - Partial sequence of strain 00-201-1422 wzx gene. 

SEQ ID NO: 10 - Partial sequence of strain 00-21 1-1669 wzx gene. 

SEQ ID NO: 1 1 - Partial sequence of strain OOS002 wzx gene. 
10 SEQ ID NO:12 - Partial sequence of strain 00-251-3185 wzy gene. 

SEQ ID NO:13 - Partial sequence of strain 01-122-0226 wzy gene. 

SEQ ID NO:14 - Partial sequence of strain 01-192-2471 wzy gene. 

SEQ ID NO:15 - Partial sequence of strain MA055100 wzy gene. 

SEQ ID NO:16 - Partial sequence of strain NZSPNOl/329 wzy gene. 
15 SEQ ID NO:17 - Partial sequence of strain 00-256-1986 wzy gene. 

SEQ ID NO:18 - Partial sequence of strain NZSPNOl/276 wzy gene. 

SEQ ID NO:19 - Partial sequence of strain 00-201-1422 wzy gene. 

SEQ ID NO:20 - Partial sequence of strain 00-21 1-1669 wzy gene. 

SEQ ID NO:21 - Partial sequence of strain 00S002 wzy gene. 
20 SEQ ID NO:22 - Partial sequence of strain NZSPNOl/509 <^sl and wzx genes. 

SEQ ID NO:23 - Partial sequence of strain MA050408 cpslmd wzx genes. 

SEQ ID NO:24 - Partial sequence of strain MA052433 cpsi and wzx genes. 

SEQ ID NO:25 - Partial sequence of strain 00S009 cpsI md wzx genes. 

SEQ ID NO:26 - Partial sequence of strain 99-325-0373 cpsland wzx genes. 
25 SEQ ID NO:27 - Partial sequence of strain NZSPNOO/454 cpsI and wzx genes. 

SEQ ID NO:28 - Partial sequence of strain NZSPNOO/484 cpslmd wzx genes. 

SEQ ID NO:29 - Partial sequence of strain 00-081-2291 wzy and wzx genes. 

SEQ ID NO:30 - Partial sequence of strain 00S168 wzy and wzx genes. 

SEQ ID NO:31 - Partial sequence of strain 00-280-1493 wzy and wzx genes. 
30 SEQ ID NO:32 - Partial sequence of strain MA063073 wzyeaidwzx genes. 

SEQ ID NO:33 - Partial sequence of strain NZSPNOO/410 wzy and wzx genes. 

SEQ ID NO:34 - Partial sequence of strain NZSPNOl/243 wzy and wzx genes. 

SEQ ID NO:35 - Partial sequence of strain MA063087 wzy and wzx genes. 

SEQ ID NO:36 - Partial sequence of strain MA063207 wzy and wzx genes. 
35 SEQ ID NO:37 - Partial sequence of strain 01S333 wzx gene. 

SEQ ID NO:38 - Partial sequence of strain MA050663 wciWand wzx genes. 



wo 2004/090159 



PCT/AU2004/000480 



14 

SEQ ID NO:39 - Partial sequence of strain 01S319 wciW and wzx genes. 
SEQ ID NO:40 - Partial sequence of strain NZSPNOO/353 wciWand wzx genes. 
SEQ ID NO:41 - Partial sequence of strain MA062610 wciW and wzx genes. 
SEQ ID NO:42 - Partial sequence of strain MA053392 wciWand wzx genes. 
5 SEQ ID NO:43 - Partial sequence of strain NZSPNOO/3 19 wciWand wzx genes. 
SEQ ID NO:44 - Partial sequence of strain NZSPNOl/278 wciWmd wzx genes. 
SEQ ID NO:45 - Partial sequence of strain 01S009 wciW and wzx genes. 
SEQ ID NO:46 - Partial sequence of strain MA052628 wciWand wzx genes. 
SEQ ID NO:47 - Partial sequence of strain 00-081-2291 cpsJ and wzy genes. 
SEQ ID NO:48 - Partial sequence of strain 00-280-1493 cpsJ and wzy genes. 
SEQ ID NO:49 - Partial sequence of strain NZSPNOO/410 cpsJ and wzy genes. 
SEQ ID NO:50 - Partial sequence of strain NZSPNOl/243 cpsJ and wzy genes. 
SEQ ID NO:51 - Partial sequence of strain MA063073 cpsJand wzy genes. 
SEQ ID NO: 52 - Partial sequence of strain 00S168 cps J and wzy genes. 
SEQ ID NO:53 - Partial sequence of strain MA063087 cpsJ and wzy genes. 
SEQ ID NO:54 - Partial sequence of strain MA063207 cpsJ and wzy genes. 
SEQ ID NO:55 - Partial sequence of strain 01 S3 19 wzx and wzy genes. 
SEQ ID NO:56 - Partial sequence of strain NZSPNOO/3 53 wzx and wzy genes. 
SEQ ID NO:57 - Partial sequence of strain MA062610 wzx and wzy genes. 
SEQ ID NO:58 - Partial sequence of strain MA053392 wzx and wzy genes. 
SEQ ID NO:59 - Partial sequence of strain NZSPNOO/319 wzx and wzy genes. 
SEQ ID NO:60 - Partial sequence of strain NZSPNOl/278 wzx and wzy genes. 
SEQ ID NO:61 - Partial sequence of strain MA050663 wzx and wzy genes. 
SEQ ID NO:62 - Partial sequence of strain MA052628 wzx and wzy genes. 
SEQ ID NO:63 - Partial sequence of strain 01S009 wzx and wzy genes. 
SEQ ID NO'S 64 to 143 - OUgonucleotide primers provided in Table 2. 
SEQ ID NO'S 144 to 333 - Oligonucleotide primers provided in Table 7. 
SEQ ID NO:334* - Sequence of serotype 33C wzx gene. 
SEQ ID NO:335* - Sequence of serotype lOB wzx gene. 
SEQ ID NO:336* - Sequence of serotype IOC wzx gene. 
SEQ ID NO:337'* - Sequence of serotype lOF wzx gene. 
SEQ ID NO:338* - Sequence of serotype 1 1 A wzx gene. 
SEQ ID NO:339* - Sequence of serotype 1 ID wzx gene. 
SEQ ID NO:340* - Sequence of serotype 12A wzx gene. 
SEQ ID NO:341* - Sequence of serotype 12B wzx gene. 
SEQ ID NO:342* - Sequence of serotype 12F wzx gene. 
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SEQ ID NO:343* - Sequence of serotype 13 wzx gene. 

SEQ ID NO:344* - Sequence of serotype 14 wzx gene. 

SEQ ID NO:345* - Sequence of serotype 15A wzx gene. 

SEQ ID NO:346* - Sequence of soro^pe 15B wzx gene. 
5 SEQ ID NO:347* - Sequence of serotype 15C wzx gene. 

SEQ ID NO:348* - Sequence of serotype 15F wzx gene. 

SEQ ID NO:349* - Sequence of serotype 16A wzx gene. 

SEQ ID NO:350* - Sequence of serotype 16F wzx gene. 

SEQ ID NO:351* - Sequence of serotype 17A wzx gene. 
10 SEQ ID NO:352* - Sequence of serotype 17F wzx graie. 

SEQ ID NO:353* - Sequence of serotype 18A wzx gene. 

SEQ ID NO:354* - Sequence of serotype 18B wzx goie. 

SEQ ID NO:355* - Sequence of serotype 18F wzx gene. 

SEQ ID NO:356* - Sequence of serotype 20 wzx gene. 
15 SEQ ID NO:357* - Sequence of serotype 22A wzx gene. 
SEQ ID NO:358* - Sequence of serotype 22F wzx gene. 
SEQ ID NO:359* - Sequence of serotype 23 A wzx gene. 
SEQ ID NO:360* - Sequence of serotype 23B wzx gene. 
SEQ ID NO:361* - Sequence of serotype 24B wzx gene. 
20 SEQ ID NO:362* - Sequence of serotype 25 A wzx gene. 
SEQ ID NO:363* - Sequrace of serotype 25F wzx gene. 
SEQ ID NO:364* - Sequence of serotype 27 wzx gene. 
SEQ ID NO:365* - Sequence of serotype 28 A wzx gene. 
SEQ ED NO:366* - Sequence of serotype 28F wzx gene. 
25 SEQ ID NO:367* - Sequence of serotype 29 wzx gene. 
SEQ ID NO:368* - Sequence of serotype 31 wzx gea&. 
SEQ ID NO:369* - Sequence of serotype 32A wzx gene. 
SEQ ID NO:370* - Sequence of serotype 32F wzx gene. 
SEQ ID NO:371 * - Sequence of serotype 33A wzx gene. 
30 SEQ ID NO:372* - Sequence of serotype 33B wzx gene. 
SEQ ID NO:373* - Sequence of serotype lOA wzx gene. 
SEQ ID NO:374* - Sequence of serotype 9N wzx gene. 
SEQ ID NO:375* - Sequence of serotype 34 wzx gene. 
SEQ ID NO:376* - Sequence of serotype 35 A wzx gene. 
35 SEQ ID NO:377* - Sequence of serotype 35B wzx gene. 
SEQ ID NO:378* - Sequence of sero^e 35C wzx gene. 
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SEQ ID NO:379* - Sequence of serotype 35F wzx gene. 

SEQ ID NO:380* - Sequence of serotype 36 wzx z&a.e. 

SEQ ID NO:381* - Sequence of serotype 38 wzx gene. 

SEQ ID NO:382* - Sequence of serotype 39 wzx gene. 
5 SEQ ID NO:383* - Sequence of serotype 40 wzx gene. 

SEQ ID NO:384* - Sequence of serotype 41 A wzx gene. 

SEQ ID NO:385* - Sequence of serotype 41F wzx gene. 

SEQ ID NO:386* - Sequence of serotype 42 wzx gene. 

SEQ ID NO:387* - Sequence of serotype 43 wzx gene. 
10 SEQ ID NO:388* - Sequence of serotype 44 wzx gene. 

SEQ ID NO:389* - Sequence of serotype 45 wzx gene. 

SEQ ID NO:390* - Sequence of serotype 46 wzx gene. 

SEQ ID NO:391 * - Sequence of serotype 47 A wzx gene. 

SEQ ID NO:392* - Sequence of serotype 47F wzx gene. 
1 5 SEQ ID NO:393 * - Sequence of serotype 48 wzx gene. 

SEQ ID NO:394* - Sequence of serotype 48(1) wzx gene. 

SEQ ID NO:395* - Sequence of serotype 7A wzx gene. 

SEQ ID NO:396* - Sequence of serotype 7C wzx gene. 

SEQ ID NO:397* - Sequence of serotype 7F wzx gene. 
20 SEQ ID NO:398* - Sequence of serotype 9A wzx gene. 

SEQ ID NO:399* - Sequence of serotype 9L wzx gene. 

SEQ ID NO:400* - Sequence of serotype 33D wzx gene. 

SEQ ID NO:401* - Sequence of serotype 33B wzy gene. 

SEQ ID NO:402* - Sequence of serotype lOB wzy gene. 
25 SEQ ID NO:403 * - Sequence of serotype 1 OC wzy gene. 

SEQ ID NO:404* - Sequence of serotype lOF wiy gene. 

SEQ ID NO:405* - Sequence of serotype 1 1 A wzy gene. 

SEQ ID NO:406* - Sequence of serotype 1 ID wzy gene. 

SEQ ID NO:407* - Sequence of serotype 12A wzy gene. 
30 SEQ ID NO:408* - Sequence of serotype 12B wzy gene. 

SEQ ID NO:409* - Sequence of serotype 12F wzy gene. 

SEQ ID NO:410* - Sequence of serotype 13 wzy gene. 

SEQ ID NO:411* - Sequence of sero^e 14 wzy gene. 

SEQ ID NO:412* - Sequence of serotype 15A wzy gene. 
35 SEQ ID NO:413* - Sequence of serotype 15B wzy gene. 

SEQ ID NO:414* - Sequence of serotype 15C wzy gene. 
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SEQ ID NO:415* - Sequence of serotype 15F wzy gene. 
SEQ ID NO:416* - Sequaice of serotype 16A wzy gene. 
SEQ ID NO:417* - Sequence of serotype 16F yvzy gene. 
SEQ ID NO:418* - Sequence of serotype 17A -wzy gene. 
5 SEQ ID NO:419* - Sequaice of serotype 17F gene. 
SEQ ID NO:420* - Sequence of serotype 18A wg; gene. 
SEQ ID NO:421* - Sequence of serotype 18B wzy gene. 
SEQ ID NO:422* - Sequence of sero^e 18F wzy gene. 
SEQ ID NO:423* - Sequence of serotype 19C yvzy gene. 
10 SEQ ID NO:424* - Sequence of serotype 20 wzy gene. 
SEQ ID NO:425* - Sequence of serotype 22A wzy gene. 
SEQ ID NO:426* - Sequence of serotype 22F wzy gene. 
SEQ ID NO:427* - Sequence of serotype 23A wzy gene. 
SEQ ID NO:428* - Sequence of serotype 23B wzy gene. 
15 SEQ ID NO:429* - Sequence of serotype 24B wzy gene. 
SEQ ID NO:430* - Sequence of serotype 25A wzy gene, 
SEQ ID NO:431* - Sequence of serotype 25F wzy gene. 
SEQ ID NO:432* - Sequence of serotype 27 wzy gene. 
SEQ ID NO:433* - Sequence of serotype 28A wzy gene. 
20 SEQ ID NO:434* - Sequence of seotype 28F wzy gene. 
SEQ ID NO:435* - Sequence of serotype 29 wzy gene. 
SEQ ID NO:436* - Sequence of serotype 31 wzy gene. 
SEQ ID NO:437* - Sequence of serotype 32A wzy gene. 
SEQ ID NO:438* - Sequence of serotype 32F wzy gene. 
25 SEQ ID NO:439* - Sequence of serotype 33A wzy gene. 
SEQ ID NO:440* - Sequence of serotype lOA wzy gene. 
SEQ ID NO:441* - Sequence of serotype 9N wzy gene. 
SEQ ID NO:442* - Sequence of serotype 33D wzy gene. 
SEQ ID NO:443* - Sequence of serotype 34 wzy gene. 
30 SEQ ID NO:444* - Sequence of serotype 35A wzy gene. 
SEQ ID NO:445* - Sequence of serotype 35B wzy gene. 
SEQ ID NO:446* - Sequence of serotype 35C wzy gene. 
SEQ ID NO:447* - Sequence of serotype 35F wzy gene. 
SEQ ID NO:448* - Sequence of serotype 36 wzy gene. 
35 SEQ ID NO:449*- Sequence ofserotype 38 wzy gene. 
SEQ ID NO:450* - Sequence of serotype 39 wzy gene. 
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SEQ ID NO:451* - Sequeace of serotype 40 wzy gene. 

SEQ ED NO:452* - Sequence of serotype 41 A wzy gene; 

SEQ ID NO:453* - Sequence of serotype 41F wzy gene. 

SEQ ID NO:454* - Sequence of serotype 42 wzy gene. 
5 SEQ ID NO:455* - Sequence of serotype 43 wzy gene. 

SEQ ID NO:456* - Sequence of serotype 44 wzy gene. 

SEQ ID NO:457* - Sequence of sero^e 45 wzy gene. 

SEQ ID NO:458* - Sequence of serotype 46 wzy gene. 

SEQ ID NO:459* - Sequence of serotype 47 A wzy gene. 
10 SEQ ID NO:460* - Sequence of serotype 47F wzy gene. 

SEQ ID NO:461* - Sequence of serotype 48 wzy gene. 

SEQ ID NO:462* - Sequence of serotype 48(1) wzy gene. 

SEQ ID NO:463* - Sequence of serotype 7A wzy gene. 

SEQ ID NO:464* - Sequence of serotype 7C wzy gene. 
1 5 SEQ ID NO:465* - Sequence of serotype 7F vu^y gene. 

SEQ ID NO:466* - Sequence of serotype 9A wzy gene. 

SEQ ID NO:467* - Sequence of serotype 9L wzy gene. 

SEQ ID NO:468* - Sequence of serotype 33C wzy gene. 

SEQ ID NO:469 - Sequence of serotype 9V wzx gene (Genbank accesion no. 
20 AF402095). 

SEQ ID NO:470 - Sequence of serotype 19B wzx gene (Gaibank accesion no. 
AF004325). 

SEQ ID NO:471 - Sequence of serotype 19C wzx gene (Genbank accesion no. 
AF105116). 

25 SEQ ID NO:472 - Sequence of serotype 19F wzx gene (Genbank accesion no. U09239). 
SEQ ID NO:473 - Sequence of serotype 2 wzx gene (Genbank accesion no. AF026471). 
SEQ ID NO:474 - Sequence of serotype 23F wzx geae (Genbank accesion no. 
AF057294). 

SEQ ID NO:475 - Sequence of serotype 33F wzx gene (Genbank accesion no. 
30 AFAJ006986). 

SEQ ID NO:476 - Sequence of serotype 37 wzx gene (Genbank accesion no. 
AJ131984). 

SEQ ID NO:477 - Sequence of serotype 6A wzx gene (Genbank accesion 
no.AY078347). 

35 SEQ ID NO:478 - Sequence of serotype 6B wzx gene (Genbank accesion no. 
AF3 16640). 
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SEQ ED NO:479 - Sequence of serotype 8 wzx gene (Genbank accesion no. AF3 16641). 
SEQ ID NO:480 - Sequence of serotype 18C wzx gene (Genbank accesion no. 
AF3 16642). 

SEQ ID NO:481 - Sequence of serolype 9V wzy gene (Genbank accesion no. 
5 AF402095). 

SEQ ID NO:482 - Sequence of serotype 19B wzy gene (Genbank accesion no. 
AF004325). 

SEQ ID NO:483 - Sequence of serotype 19F wzy gene (Genbank accesion no. U09239). 
SEQ ID NO:484 - Sequence of serotype 2 W2y gene (Genbank accesion no. AF026471). 
10 SEQ ID NO:485 - Sequence of serotype 23F wzy gene (Genbank accesion no. 
AF057294). 

SEQ ID NO:486 - Sequence of serotype 33F wzy gene (Genbank accesion no. . 
AFAJ006986). 

SEQ ID NO:487 - Sequence of serotype 37 W2y gene (Graibank accesion .no. 
15 AJ131984). 

SEQ ID NO:488 - Sequence of serotype 6A wzy gene (Genbank accesion 
no.AY078347). 

SEQ ID NO:489 - Sequence of serotype 6B wzy gene (Genbank accesion no. 
AF316640). 

20 SEQ ID NO:490 - Sequence of serotype 8 W2y gene (Genbank accesion no. AF3 16641). 
SEQ ID NO:491 - Sequence of serotype 18C wsy gene (Genbank accesion no. 
AF3 16642). 

SEQ ID NO:492 - Consensus sequence for 3' end of the cpsA gene and the 5' end of the 
epsB gene of iS. pneumoniae strains that were analysed. 

25 

* Indicates that these sequences were extracted from unnannotated sequence data from 
fhe Sanger Institute website. 

DETAILED DESCR IPTION OF THF. TTNTWAmrmyr 
30 Definitinng 

Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as commonly understood by one of ordinary skill in the art (e.g., in cell 
culture, molecular genetics, nucleic add chemistry, hybridization techniques and 
biochemistry). 

35 As used herein, the term "nucleotide sequence between the 3* «id of the cpsA 

gene and the 5' end of the cpsB gene" at least refers to the region spanning from 
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nucleotide 2470 to nucleotide 3268 of Figure 1. Figure 1 provides the genomic 
sequence of cpsA (yvzg) and iy?sB (wzh) genes of serotype 4 as published by Jiang et al. 
(2001) and submitted as GenBank Accession Number AF3 16639. As the skilled 
addressee would be aware, the same region from other serotypes of 5^. pneumoniae can 
5 be identified using standard techniques such as DNA cloning, sequencing and 
nucleotide sequence alignment. Such techniques are described in further detail in the 
Examples section. In addition, these techniques have hem used to detennine the 
nucleotide sequence between the 3' end of the cpsA gene and the 5' end of the cpsB 
gene from many different serotypes ofS. pneumoniae, the results of which, including a 
10 consensus sequence for this region, are also provided in Figure 2. 

As used herein, the term "primer pairs that ampUfy the same region, or 
diagnostic portion thereof, from the genome of a strain of 5. pneumoniae", or variations 
thereof refers to the capability of the skiUed addressee to determine where the 
identified primers of the claimed invention hybridize the S. pneumoniae genome of a 
15 particular strain(s), and subsequent ability to design alternate primers which can be 
used for the same purpose as the primers defined herein. TypicaUy, these alternate 
primers wiU hybridize the same region of the genome but be larger or smaller in size, or 
these alternate primers viiU hybridize to a region of the genome which is in close 
proximity, for example within 500 basepairs, to where the specifically defined primers 
20 hybridize. NaturaUy, the tenn "diagnostic portion thereof refers to the alternate 
primers being enable of ampUfying a portion of the region of the defined primers but 
still capable of ampHfying enough of the region to detennine the serotype of a 
particular S, pneumoniae isolate. 

25 General Techniques 

Unless otherwise indicated, the recombinant DNA and immunological 
techniques utilized in the present invention are standard procedures, well known to 
those skilled in the art. Such techniques are described and explained throughout the 
literature in sources such as, J. Perbal, A Practical Guide to Molecular Cloning John 

30 Wiley and Sons (1984), J. Sambrook et al., Molecular Cloning: A Laboratory Manual, 
Cold Spring Harbour Laboratory Press (1989). T.A. Brown (editor). Essential 
Molecular Biology: A Practical Approach, Volumes 1 and 2, IRL Press (1991). D.M 
Glover and B.D. Hames (editors), DNA Cloning: A Practical Approach, Volumes 1-4 
IRL Press (1995 and 1996), and F.M. Ausubel et al. (editors). Current Protocols yL 

35 Molecular Biology. Greene Pub. Associates and Wiley-Interscience (1988, including 
all updates until present), Ed Harlow and David Lane (editors) Antibodies: A 
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Laboratory Manual, Cold Spring Harbour Laboratory, (1988), and J.E. Coligan et al, 
(editors) Current Protocols in Immunology, John Wiley & Sons (including all updates 
until present), and are incorporated herein by referaice. 

5 Detection of Polymorp hisms 

Any technique known in the art can be used to detect a polymorphism described 
herein. Examples of such techniques include, but are not limited to, sequencing of the 
DNA at one or more of the relevant positions; differential hybridisation of an 
oligonucleotide probe designed to hybridise at the relevant positions of a particular S. 

10 pneumoniae serotype(s); denaturing gel electrophoresis following digestion with an 
appropriate restriction enzyme, preferably following amplification of the relevant DNA 
regions; SI nuclease sequence analysis; non-denaturing gel electrophoresis, preferably 
following amplification of the relevant DNA regions; conventional RFLP (restriction 
fragment length polymorphism) assays; selective DNA amplification using 

15 oligonucleotides which are matched for a particular S. pneumoniae serotype(s) 
unmatched for other S. pneumoniae serotype(s); or the selective introduction of a 
restriction site using a PGR (or similar) primer matched for a particular S. pneumoniae 
serotype(s), followed by a restriction digest. As outlined above, it is preferred that the 
nucleotide sequence between the 3' end of the cpsA gene and the 5' end of ihe cpsB 

20 gene is characterized by DNA sequencing, whilst the analysis at least a portion of the 
wzy and/or wzx gene is periformed by procedures involving the detection of 
amplification products. 

In one embodiment, the informative serotyping information provided herein is 
adapted to produce a molecular capsular sequence ^ing database as generally 
25 described by Robertson et al. (2004). 

PCR-based methods of detection may rely upon the use of primer pairs, at least 
one of which binds specifically to a region of interest in one or more, but not all, 
serotypes. Unless both primers bind, no PGR product will be obtained. Consequently, 
the presence or absence of a specific PGR product may be used to determine the 
30 presence of a sequence indicative of a specific S. pneumoniae serotype(s). However, as 
mentioned, only one primer need correspond to a region of heterogeneity in the 
genes/regions of interest. The other primer may bind to a conserved or heterogenous 
region within said gene/region or even a region within another part of the S. 
pneumoniae genome, whether said region is conserved or heterogeneous between 
serotypes. 
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Alternatively, primers that bind to conserved regions of the S. pneumoniae 
genome but which flank a region whose length varies between serotypes may be used. 
In this case, a PGR product will always be obtained when S. pneumoniae bacteria are 
present but the size of the PGR product varies between serotypes. Examples of such 
varying ampUfication product lengths ate disclosed herein in relation to the wzy and 
wzx genes. 

Furthermore, a combination of specific binding of one or both primers and 
variations in the length of PGR primer may be used as a means of identifying particular 
molecular serotypes. 

In some cases, PGR and other specific hybridisation- based serotyping methods 
wHl involve the use of nucleotide primers/probes which bind specifically to a region of 
the genome of a S. pneumoniae serotype which includes a nucleotide which varies 
between two or more serotypes. Thus the primers/probes may comprise a sequence 
which is complementary to one of such regions. Where positions of heterogeneity are 
15 close together (for instance within 5 or so nucleotides), it may be desirable to use a 
primer/probe which hybridises specifically to a region of the S. pjteumoniae genome 
that comprises two or more positions of heterogeneity. Such primers/probes are likely 
to have improved specificity and reduce the likelihood of false positives. 

PGR techniques that utilize fluorescent dyes may be used in the detection 
20 methods of the present invention. These include, but are not limited to, the foUowing 
five techniques. 

i) Fluorescent dyes can be used to detect specific PGR amplified double 
stranded DNA product (e.g. ethidium bromide, or SYBR Green I). 

ii) The 5' nuclease (TaqMan) assay can be used which utilizes a specially 
25 constructed primer whose fluorescence is quenched until it is released by the nuclease 

activity of the Taq DNA polymerase during extension of the PGR product. 

iii) Assays based on Molecular Beacon technology can be used which rely on a 
speciaUy constructed oligonucleotide that when self-hybridized quenches fluorescence 
(fluorescent dye and quencher molecule are adjacent). Upon hybridization to a specific 

30 amplified PGR product, fluorescence is increased due to separation of the quencher 
firom the fluorescent molecule. 

iv) Assays based on Amplifluor (Intergen) technology can be used which utilize 
specially prepared primes, where again fluorescence is quenched due to self- 
hybridization. In this case, fluorescence is released during PGR amplification by 

35 extension through flie primer sequence, which results in the separation of fluorescent 
and quencher molecxiles. 
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v) Assays that rely on an increase in fluorescence resonance energy transfer can 
be used which utUize two speciaUy designed adjacent primers, which have different 
fluorochromes on their ends. When these primers anneal to a specific PGR amplified 
product, the two fluorochromes are brougjit together. The excitation of one 
5 fluorochrome results in an increase in fluorescence of the other fluorochrome. 

Probes and primers may be fi-agments of DNA isolated firom nature or may be 
synflietic. In one embodiment, primers/probes have a high melting temperature of 
>70**C so that they may be used in rapid cycle PGR. Preferably, the primers/probes 
comprise at least 10, 15 or 20 nucleotides. Typically, primra-s/probes consist of fewer 

10 than 50 or 30 nucleotides. Primers/probes are generally polynucleotides comprising 
deoxynucleotides. They may also be polynucleotides which include within them 
synthetic or modified nucleotides. A number of different types of modification to 
oUgonucleotides are known in the art. These include methylphosphonate and 
phoq>horothioate backbones, addition of acridine or polylysine chains at the 3' and/or 5' 

1 5 ends of the molecule. For the purposes of the present invention, it is to be understood 
that the polynucleotides described herein may be modified by any method available in 
the art. Primers/probes may be labelled with any suitable detectable label such as 
radioactive atoms, fluorescent molecules or biotin. 

The primers be synthesized using techniques which are well known in the art. 

20 Generally, the primers can be made using synthesizing machines which are 
commrajdally available. 

If required, in order to facilitate subsequent cloning of amplified sequences, 
primers may have restriction enzyme sites appaided to their 5' ends. Thus, all 
nucleotides of tiie primers are derived firom the gene sequence of interest or sequences 

25 adjacent to that gene except the few nucleotides necessary to fonn a restriction enzyme 
site. Such enzymes and sites are well known in the art. 

A sample to be typed for the presence and/or identification ofaS. pneumoniae 
serotype may be from a bacterial culture or a clinical sample from a patient, typically a 
human patient. Clinical samples may be cultured to produce a bacterial culture. 

30 However, it is also possible to test clinical samples directiy with a culturing step. 

The methods of the presrait mvention can be used in a multi-step serotyping 
strategy. An exacqple of such a multi-step serotyping strategy (algorithm) is shown in 
Table 6. However, a variety of other, strategies are envisaged and can be designed by 
the skilled person using the sequence heterogeneity information presented herein. In 

35 particular, it is preferred that the serotyping procedure comprise at least one analysis 
step based on analysing one or regions between the 3' end of the cpsA gene and the 5' 
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end of the cpsB gene. This analysis may optionally be combined with an analysis of 
one or more regions within the wzy and/or wzx genes. 

Microarravs 

5 Analysis of & pneumoniae genomic sequences using the above techniques may 

take place in solution followed by standard resolution using methods such as gel 
electrophoresis. However in a preferred aspect of the invention, the primers/probes are 
immobilised onto a solid substrate to form arrays. 

The polynucleotide probes are typically immobilised onto or in discrete regions 

10 of a solid substrate. The substrate may be porous to allow immobilisation within the 
substrate or substantially non-porous, in which case the probes are typically 
immobilised on the surface of the substrate. Examples of suitable solid substrates 
include flat glass (such as borosilicate glass), silicon wafers, mica, ceramics and 
organic polymers such as plastics, including polystyrene and polymethacrylate. It may 

15 also be possible to use semi-permeable membranes such as nitrocellulose or nylon 
membranes, which are widely available. The semi-permeable membranes may be 
mounted on a more robust solid surface such as glass. The surfaces may optionally be 
coated with a layer of metal, such as gold, platinum or other transition metal. 

Preferably, the solid substrate is generally a material having a rigid or semi-rigid 

20 surface. In preferred embodiments, at least one surface of the substrate will be 
substantially flat, although in some embodiments it may be desirable to physically 
separate synthesis regions for different polymers with, for example, raised regions or 
etched trenches. It is also preferred that the solid substrate is suitable for the high 
density application of DNA sequences in discrete areas of typically from 50 to 100 ^m, 

25 giving a density of 10000 to 40000 cm"^. 

The solid substrate is conveniently divided up into sections. This may be 
achieved by techniques such as photoetching, or by the application of hydrophobic 
inks, for example teflon-based inks (Cel-line, USA), Discrete positions, in which each 
different probes are located may have any convenient shape, e.g., circular, rectangular, 

30 elUptical, wedge-shaped, etc. 

Attachment of the library sequences to the substrate may be by covalent or non- 
covalent means. The library sequences may be attached to the substrate via a layer of 
molecules to which the library sequences bind. For example, the probes may be 
labelled with biotin and the substrate coated with avidin and/or streptavidin. A 

35 convenient feature of using biotinylated probes is that the efficiency of coupling to the 
solid substrate can be determined easily. Since the polynucleotide probes may bind 
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only poorly to some solid substrates, it is often necessary to provide a chranical 
intarfece between the solid substrate (such as in the case of glass) and the probes. 
Thus, the surfece of the substrate may be prepared by, for example, coating with a 
chemical that increases or decreases flie hydrophobidty or coating with a chemical that 
5 allows covalent linkage of the polynucleotide probes. Some chemical coatings may 
both alter the hydrophobidty and allow covalent linkage. Hydrophobidty on a solid 
substrate may readily be increased by silane treatment or other treatments known in the 
art. Examples of suitable chemical coatings include polylysine and 
poly(ethyleneimine). Furth«r details of methods for the attachment of are provided in 
10 US 6,248,521. 

Techniques for producing immobilised arrays of nucleic add molecules have been 
described in the art. A useful review is provided in Sdiena et al. (1998), which also 
gives references for the techniques described therein. 

Microarray-manufacturing technologies fall into two main categories — synthesis 

15 and delivery. In the synthesis approaches, microarrays are prepared in a stepwise 
feshion by tihe in situ synthesis of nucleic adds from biochemical building blocks. With 
each round of synthesis, nucleotides are added to growing chains until the desired 
length is achieved. A number of prior art methods desaibe how to synthesise single- 
stranded nucldc add molecule libraries in situ, using for example masking tedmiques 

20 (photolithography) to build up various permutations of sequaices at the various discr^e 
positions on the solid substrate. US 5,837,832 describes an improved method for 
producing DNA arrays immobilised to silicon substrates based on very large scale 
integration technology. In particular, U.S. Patent No. 5,837,832 describes a strategy 
called "tiling" to synthesize spedfic sets of probes at spatially-defined locations on a 

25 substrate which may be used to produced the immobiUsed DNA libraries of the present 
invention. US 5,837,832 also provides references for earlier techniques that may also be 
used. 

The delivery technologies, by contrast, use the exogenous deposition of 
prepared biochemical substances for chip fabrication. For example, DNA may also be 

30 printed directly onto the substrate using for example robotic devices equipped with 
dther pins (mechanical microspotting) or piezo electric devices (ink jetting). In 
mechanical microq)otting, a biochemical sample is loaded into a spotting pin by 
capillary action, and a small volume is transferred to a solid surface by physical contact 
between the pin and the solid substrate. After the first spotting cycle, flie pin is washed 

35 and a second sample is loaded and deposited to an adjacent address. Robotic control 
systems and multiplexed printheads allow automated microarray fabrication. Ink jetting 
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10 



15 



involves loading a biochemical sample, such as a polynucleotide into a miniature 
nozzle equipped with a piezoelectric fitting and an electrical current is used to expel a 
precise amount of liquid &om the jet onto the substrate. After the first jetting step, the 
jet is washed and a second sample is loaded and deposited to an adjacent address. A 
repeated series of cycles with multiple jets enables rapid microairay production. 

In one embodiment, the microairay is a high density array, comprising greater 
than about 50, preferably greater than about 100 or 200 different nucleic acid probes. 
Such high density probes comprise a probe density of greater than about 50, preferably 
greater than about 500, more preferably greater than about 1,000, most preferably 
greater than about 2,000 different nucleic acid probes per cml The array may further 
comprise mismatch control probes and/or reference probes (such as positive controls). 

Microarrays of the invention will typically comprise a plurality of 
primers/probes as described above. The primers/probes may be grouped on the array in 
any order. 

Elements m an array may contain only one type of probe/primer or a number of 
dififerent probes/primers. 

Detection of binding of S. pneumoniae DNA to immobiUsed probes/primers 
may be performed using a number of techniques. For example, the immobilised probes 
which are specific for one or a number of serotypes, may function as capture probes. 
Following binding of the genomic DNA to the array, the array is washed and incubated 
with one or more labeUed detection probes which hybridise specifically to regions of 
the S. pneumoniae genome which are conserved (for example the S. pneumoniae psaA 
or pneumolysin probes/primers described herem could be utilized for this purpose). 
The binding of these detection probes may then be determined by detecting the 
presence of the label. For example, the label may be a fluorescent label and the array 
may be placed in an X-Y reader under a charge-coupled device (CCD) camera. 

Other techniques include labelling the genomic DNA prior to contact with the 
array (using nick-translation and labelled dNTPs for example). Binding of the genomic 
DNA can then be detected directly. 
30 It is also possible to employ a single PCR amplification step using labelled 

dNTPs. In this embodiment, the genomic DNA fi-agment binds to a first primer present 
in the array. The addition of polymerase, dNTPs, including some labelled dNTPs and a 
second primer results in synthesis of a PCR product incorporating labelled nucleotides. 
The labelled PCR fiagment captured on the plate may then be detected. 
35 A number of available detection techniques do not require labels but instead rely 

on changes in mass upon Ugand binding (e.g. surfece plasmon resonance- SPR). The 
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principles of SPR and the types of solid substrates required for use in SPR (e.g. 
BIACore chips) are described in Ausubel et ed.. Short Protocols in Molecular Biology 
(1999) 4*^ Ed, John Wiley & Sons, Inc. 

Examples of ttie utilization of microarrays in gaiotyping include the use of 
5 microarrays to differentiate between closely related Cryptosporidium parvum isolates 
and Cryptosporidium species (Straub et al., 2002), the use of microarrays to 
differentiate between species of Listeria (Volokhov et al., 2002), and the use of 
microarrays to differentiate within species of Staphylococcus aureus (van Leeuwen et 
al., 2003). The detection principles applied in these studies can be used with the 

10 polymoiphisms/primers/probes identified by the present inventors to identify different 
serotypes ofS. pneumoniae in a sainple. 

In the present instance, according to 800bp cpsA-cpsB aligmnent results (Figure 
2) regions, such as the first 20 nucleotides provided in Figure 2, are scanned to see 
whether they contains polymorphisms. Where polymorphisms occur, probes can be 

15 designed for each "type" (allele)-specific probes (and name them as 1-1, 1-2 , etc.), 
which will cover all the cpsA-cpsB regions for all the known sequence types. The 
combination of all the above allele-specific probes (about or less than 20 allele x 40-50 
=800~1000 probes all together) hybridisation results will define the microarray 
hybridisation types like MLST (1-0-10 etc), which would be nearly equal to the 

20 sequencing results. Bioinformatics software will tell which sequmce type the 
"specimen/strain" is. 

Kits 

In one embodiment, kits of the present mvention include, in an amount 
25 sufficient for at least one assay, a polynucleotide probe of the invention which 
preferentially hybridizes to a target nucleic acid sequence in a test sample under 
hybridization assay conditions. Kits containing multiple probes are also contemplated 
by the present invention where the multiple probes are designed to target different 
nucleic acid sequences from different S. pneumoniae serotypes and may include 
30 distract labels which permit the probes to be differentially detected in a test sample. 
Kits according to the present invention may fiirther comprise at least one of the 
following: (i) one or more amplification primers for amplifying a target sequence 
contained in or derived from the target nucleic add; (ii) a capture probe for isolating 
and purifying target nucleic acid present in a test sample; and (iii) if a capture probe is 
35 included, a solid support material (e.g., magnetically responsive particles) for 
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immobilizing the capture probe, either directly or indirectly, in a test sample. Kits of 
the present invention may ftirther include one or more helper probes. 

Typically, the kits will also include instructions recorded in a tangible fomi 
(e.g., contained on paper or an electronic medium) for using the packaged 
5 polynucleotide in a detection assay for determining the presence or amount of a target 
nucleic add sequence in a test sample. The assay desaribed in the written instructions 
may include steps for isolating and purifying the target nucleic add prior to detection 
with the polynucleotide probe, and/or amplifying a target sequence contained in the 
target nucleic acid. The instructions will typically indicate the reagents and/or 
10 concentrations of reagents and at least one assay method parameter which might be, for 
example, the relative amounts of reagents to use per amount of sample. In addition, 
such specifics as maintenance, time periods, temperature and buffer conditions may 
also be included. 

15 Uses 

As discussed above, S. pneumoniae is a leading cause of morbidity and mortality 
causing invasive disease such as meningitis and pneumonia as well as more localised 
disease such as acute otitis media and sinusitis. Continued surveillance is critical to monitor 
vaccine efficacy and changes in inddence and distribution of colonising and invasive 
20 serotypes. Any increase in disease caused by previously uncommon nonvacdne serotypes 
could necessitate a change in vaccine composition Thus, the detection methods, 
probes/primer and microarrays of the invention macy be used to monitor the epidemiology 
of invasive S, pneumoniae infections to assist in disease control and to inform vaccine 
policy. 

25 The molecular typing methods of the invention may also assist in 

comprehensive serotype identification that will be usefiil for epidemiological and other 
related studies that will be needed to monitor S, pneumoniae before and after 
introduction ofS, pneumoniae vaccines. 

30 
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EXAMPLES 

EXAMPLE 1 - Serotvping based on the polymorphisms of flie 3^ end of the cpsA 
gene and the 5' end of the cpsB gene> combined In some instances witii the analysis 
of the wzx and/or wzv genes 
5 MATERIALS AND METHODS 

Pneumococcal reference panels (Table Vi 

Reference panels 1-4, which consisted of 118 isolates, were kindly provided and 
serotyped by colleagues in Australia and Canada. All had been serotyped using the 
standard Quellung method and included all 23 serotypes represented in the 

10 polysaccharide vaccine, and 28 additional serotypes; there were multiple isolates of 40 
serotypes and five isolates that could not be serotyped with available antisera. 
Reference panel 5 consisted of 21 invasive isolates firom our diagnostic laboratory at 
the Centre for Infectious Diseases and Microbiology (CIDM), Sydney, for which 
serotypes were known at the beginning of the study. These five reference panels were 

15 used for the development and preliminary evaluation of molecvilar capsular sequence 
methods. Panels 2 and 4 were tested by molecular capsular sequence, initially, without 
knowledge of the conventional serotyping (CS) results. 

Clinical isolates 

20 179 consecutive S. pneumoniae clinical isolates firom normally sterile sites, 

collected during the period January 1999 to June 2001, by the CIDM diagnostic 
laboratory, were studied; 21 were randonody selected to make up reference panel 5 (see 
above). Dr Diana Martin, Institute of Environmental Science and Research (ESR), 
Wellington, New Zealand provided 103 clinical isolates firom diagnostic laboratories 

25 throughout New Zealand. Clinical isolates were initially tested xismg flie MCT method, 
without knowledge of their CS results (single-blind study). Isolates were retrieved from 
storage by subculture on blood agar plates (Columbia II agar base supplraaented with 
5% horse blood) and incubated overnight at 3TC CO2 incubator, 

30 
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Table 1. Conventioiial serotyping (CS) and molecular capsular typing (MCT) 



results of & pneumoniae strains used in this study. 



Strain numbers and 




MCT-Seq* 


MCT-PCR^ 


GenBanIr 


geographic origin 






accession numbers 


Reference panel 1^ 










Queensland 










OOSOOl 


19F 


19F 


19F 


AF532666 


00S002 


6B 


6B-q 


6B 


AF532705; 










AY163180,AY163190 


00S006 


19A 


19A 


19A 


AF532663 


oosooo 




23r-g 




AF532677; 










AY163214, AY163232 


00S014 


1 


1 


1 


AF532632 


00S016 


9V 


9V 


9V 


AF532710 


00S023 


5 


5-q 




AF532697 


00S033 


17F 


17F-35B 




AF532657 


00S036 


llA 


llA-q 




AF532637 


00S042 


18C 


isaisB 


18C 


AF532661 


00S059 


9N 


9N 




AF532709 


00S063 


12F 


12F 




AF532640 


00S067 


8 


8 


8 


AF532708 


COS 124 


7F 


7F 




AF532707 


00S154 


ISB 


15B-q 




AF532649 


00S159 


4 


4 


4 




AAC1 £.0 

UUolOo 


33F 


33F-q 


33F/37 


AF532687; 










AY163199, AY163221 


00S246 


22F 


22F 




AF532673 


00S259 


2 


2-q 


2 


AF532669 


00S300 


22A 


22A 




AF532672 


01S009 


18C 


18C/18B 


18C 




01S020 


7C 


7C 




AF532706 


01S043 


lOA 


lOA-q 




AF532633 


01S143 


3 


3 


3 


AF532682 


01S146 


lOF 


lOF 




AF532635 


01S305 


20 


20/13 




AF532670 


01S319 


18A 


18A 


18C 


AF532658; 










AY163208, AY163224 


01S333 


33B 


33B 


33F-X; 


AF532686 








33F-Y-NEG 




01S358 


35B 


35B 




AF532691 


01S666 


14 


14-g 


14 


AF532643 


01S682 


16F 


16F 




AF532653 


01S691 


15C 


15C-q 




AF532651 


01S753 


4 


4 


4 


AF532693 


Reference panel 2"* 










Victoria 










0013856 


3SB 


35B 






0013976 


6A 


6A-ca 


6B 




0017666 


9V 


9V 


9V 




0019532 


23F 


23F-g 


23F 




0102206 


8 


8 


8 




0103678 


19F 


19F 


19F 




0104603 


6B 


6B-q 


6B 




0104604 


22F 


22F 
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0104912 


A 


4 


4 




\JL\JD\JIj 


14 


14-g 


14 


AF532644 










Canada 










MA007753 


31 


31 




AF532684 


Tv>r A f\fW'^/^< 
MAUUV/OD 


5 


5-q 




MA008229 


lOF 


lOF 




AF532636 


MA008562 


llA 


llA-q 






MA008622 


31 


31 






MA050408 


23A 


23A-23F 


23F-X; 


AF532674 








23F-Y-NEG 




MA050663 


18F 


18F 


18C 


AF532662; 










AY163207,AY163230 


MA050910 


2 


2-q 


2 




MA050947 


38 


38/25F 




AF532712 


MA051117 


22A 


22A 






MA051617 


35F 


35F 




AF532692 


MA051950 


31 (see Example 


31 




AF532695 




2) 








MA052002 


15A 


15A-cal 




AF532646 


MA052150 


IIB 


IIB 




AF532639 


MA052217 


7C 


7C 






MA052253 


17F 


17F-35B 






MA052433 


23A 


23A-ca 


23F-X; 


AF532675 








23F-y-NEG 




MA052434 


15A 


15A-ca2 




AF532647 


MA052628 


18C 


18C/18B 


18C 


S AY163215, AY163231 


MA052979 


15C 


15C-ca 




AF532652 


MA053096 


20 


20/13 




MA053188 


15B 


15B-q 






MA053392 


18B 


18B/18C 


18C 


AF532660; 










AY163211 AYl 63227 


MA053567 


12F 


12F 






MA053684 


38 


38/25F 






MA053782 


13 


13/20 




AF532642 


MA053909 


35B 


35B 






MA054004 


13 


13/20 






MA054006 


13 


13/20 






MA054242 


38 


38/25F 






MA054294 


16F 


16F 






MA054338 


35F 


35F 






MA054357 


1 


1 


1 




MA054490 


34 


34 




AF532690 


MA054545 


3 


3 


3 




MA054735 


lOA 


lOA-q 






MA054832 


34 


34 








7F 


7F 






MA055006 


9V 


9V 


9V 




MA055054 


22F 


22F 






MA055100 


€A 


6A-ca 


6B 


AF532702; 


MA056382 








AY163174, AY163184 


19A 


19A 


19A 


AF532664 


MA059287 


25F 


25F/38 




AF532711 


MA061296 


41A(8ee Example 


41A 




AF532694 




2) 








MA061378 


17A 


17A 




AF532655 
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MA061938 


21 


21 




AF532671 


MA062028 


29 


29 




AF532680 


MA062610 


18B 


18B/18C 


18C 




MA063013 








AY163210 AY163226 


9N 


9N 




MA063073 


33F 


33F-g/33A 


33F/37 


AF532689: 


MA063087 








AYl 63201 AYl 63220 


33A 


33A/33F-g 


33F/37 


AF532685; 


MA063189 








AY163204 AYl 63222 


Nonserotypeable 


No-amplicon 




MA063207 


37 


37 


33F/37 


AF532713; 


MA063745 








AY163205 AY163223 


Nonserotypeable 


Nonserotypeable-ca 




AF532715 


Reference panel 4^ 








New South Wales 










00-177-0145 


19A 


19A 


19A 




01-184-0091 


18C 


18C/18B 


18C 




00-237-0230 


17F 


17F-35B 




AF532656 


01-273-0175 


16F 


16F 




00-201-0306 


14 


14-g 


14 




01-117-0176 


13 


13/20 






01-239-0283 


12F 


12F 






00-206-0233 


llA 


llA-q 






00-222-0342 


lOA 


10A-23F 


23F-NEG 


AF532634 


01-180-0149 


1 


1 


1 




01-122-0226 


6A 


6A-ca 


6B 


AF532698; 










AY163172 AY163182 


99-308-0385 


4 


4 




00-234-0199 


38 


38/25F 






00-074-0065 


35F 


35F 






00-280-0121 


3 


3 


3 




99-308-0290 


23F 


23F-g 


23F 




00-244-0101 


22F 


22F 






00-250-0302 


22A 


22A 






00-244-0108 


20 


20/13 






01-009-0101 


19F 


19F 


19F 


AF532668 


01-254-0150 


7F 


7F 






Reference panel 5^ 










New South Wales, 










(croM) 










00-163-0650 


14 


14-g 


14 




00-141-1399 


19F 


19F 


19F 




00-070-0212 


23F 


23F-g 


23F 




01-018-1842 


4 


4 


4 




00-201-1422 


6B 


6B-g 


6B 


AF532703; 










AY163178, AY163188 


00-180-2749 


9V 


9V 


9V 


00-339-3084 


9N 


9N 






00-017-0985 


11 A 


llA-q 






01-072-0391 


12F 


12F 




AF532641 


00-315-3100 


15B 


15B-C 




AF532648 


99-259-1456 


18C 


18C/18B 


18C 




00-273-2862 


4 


4 


4 




00-081-2291 


33F 


33F-g/33A 


33F/37 


AY163198,AY163216 


00-118-2067 








5 


5-c 




AF532696 
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99-196-2882A 
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Notes. 

1. CS of selected 5. pneumoniae isolates from reference panels 1 and 3 was 
5 repeated by Gail Stewart and Robert Gange at Department of Microbiology, Children's 

Hospital at Westmead, New South Wales, Australia. 

2. MCT was performed and GenBaiHc accession numbers generated by Fanrong 
Kong at Centre for Infectious Diseases and Microbiology (CEDM), Institute of Clinical 
Pathology and Medical Research (ICPMR), Westmead Hospital, Westmead, New 

1 0 South Wales, Australia. See text for molecular capsular subtype (mctsp) nomenclature. 

3. Provided by Denise Murphy, Pneumococcal Reference Laboratory, Public 
Health Microbiology, Queensland Healtii Scientific Services, Queensland, Australia. 

4. Provided by Associate Professor Geoff Hogg and Jenny Davis, Microbiological 
Diagnostic Unit (MDU), Public Health Laboratory, Department of Microbiology and 

15 Immunology, University of Melbourne, Victoria, Australia. 

5. Provided by Dr. Louise P. Jette, Institut National de Sante Publique du Quebec- 
Laboratoire de Sante Publique du Quebec, Sainte-Aime-de-Bellevue, Quebec H9X 
3R5, Canada. 

6. Provided by Dr. Michael Watson, Department of Microbiology, Children's 
20 Hospital at Westmead, New South Wales, Australia. 

7. Selected 21 iS. pneumoniae clinical isolates, of which CS results were known, 
from the CIDM diagnostic laboratory. 
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8. 152 Australian S. pneumoniae clinical isolates, of which CS results were known, 
from the CIDM diagnostic laboratory. 

9. 103 New Zealand jS. pneumoniae clinical isolates Provided by Dr. Diana Martin, 
from Str^tococcus Reference Laboratory, at Institute of Environmental Science and 

5 Research (ESR), Wellington, New Zealand. 

Conventional serotvping (CS) 

CS was perfomied by the Quellung reaction using rabbit polyclonal antisera 

10 from the Statens Serum Institute, Copenhagen, Denmark (Sorensen, 1993). Briefly, 2 
M.L of a suspension of isolate, in 10% formalin saline, and 1 |xL of antisera, under a 
glass coverslip were examined for capsular swelling using a light microscope at 400x 
magnification. Clinical isolates from CIDM were serotyped at Department of 
Microbiology, Children's Hospital at Westmead, Sydney, Australia and those from 

15 New Zealand by the Streptococcus Reference Laboratory, at ESR, Wellington, New 
Zealand. Selected New Zealand clinical isolates for which only serogroup results were 
available and selected isolates from reference panels 1 and 3 were re-tested at 
Children's Hospital at Westmead. 

20 Molecular capsular se quence typing - development of method 
Oligonucleotide primers 

The oligonucleotide primers used in this study, their target sites and melting 
temperatures are shown in Table 2 and the primer pair spedficities and expected 
amplicon lengths in Table 3. Primers were designed with high melting temperatures to 

25 be used in rapid cycle PCR (Kong et al., 2000). 

Four previously published S. pneumoniae-specific primers, targeting psaA (PI, 
P2) (Morrison et al., 2000) and pneumolysin (Ha, lib) (Salo et al., 1995) were modified 
to give high melting temperatures and used to confirm that isolates were 5. 
pneumoniae. Primers were designed to amplify and sequence portion of the cpsA-cpsB 

30 gene region and to amplify serotype/serogroup-specific sequences in the wzy and wzx 
genes of 16 ^S. pneumoniae serotypes for which cps gene cluster sequences were 
available. In order to further explore the sequence heterogeneity, part of the wzx and 
wzy genes of isolates belonging to serogroups 6, 18, 23 and 33/37 were also sequenced. 
For serotype 3, which does not contain wzy and wzx genes, serotype-specific PCR 

35 targeted the orf2 (wze)'Cap3A-cap3B region (Arrecubieta et al., 1996). 
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Table 3. Specificity and expected lengths of amplicons of primer pairs used 

in this study. 



Primer pairs^ 


Specificity 


Lengtii of amplicons (base pairs) 


P1/P2 


S. pneumoniae 


864 


Ila/IIb 


S. pneumoniae 


224 


cpsSl/cpsA3^ 


S. pneumoniae 


1001 


cpsSl/cpsAl^ 


S. pneumoniae 


520 


cpsS3/cpsA2^ 


S. pneumoniae 


503 


lYS/lYA 


serotype 1 


296 


2YS/2YA 


serotype 2 


348 


4YS/4YA 


serotype 4 


348 


6A6BYS/6A6BYA 


serogroup 6 


315 


6A6BYS0/6A6BYA1^ 


srarogroup 6 


747 


8YS/8YA 


sarotype 8 


277 


9V9AYS/9V9AYA 


serotypes 9V and 9A 


338 


14YS/14YA 


serotype 14 


310 


18CYS/18CYA 


serogroup 18 


302 


18CYS0/18CYA1^ 


serogroup 18 


671 


19FYS/19FYA 


serotype 19F . 


286 


19AYS/19AYA 


serotype 19A 


270 


19B19CYS/19B19CYA 


serotypes 19B and 19C 


428 


23FYS/23FYA 


serotype 23F 


280 


33F37YS/33F37YA 


serotypes 33F/33A/37 


310 


33F37YS0/33F37YA12 


serotypes 33F/33A/37 


668 


IXS/IXA 


serotype 1 


426 


2XS/2XA 


serotype 2 


429 


4XS/4XA 


serotype 4 


324 


6A6BXS/6A6BXA 


serogroup 6 


305 


6A6BXS0/6A6BXA1^ 


serogroup 6 


1102 


8XS/8XA 


serotype 8 


325 


9V9AXS/9V9AXA 


serotypes 9V and 9A 


368 


14XS/14XA 


serotype 14 


289 


18CXS/18CXA 


serogroup 18 


368 


ISCXSO/lSCXAl^ 


serogroup 18 


721 


19FXS/19FXA 


serotype 19F 


305 
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19AXS/19AXA 


serotype 19A 


300 


19B19CXS/19B19CXA 


serotypes 19B and 19C 


327 


23FXS/23FXA 


serotypes 23F^3A 


401 


23FXS0/23FXA12 


serotypes 23F/23A 


744 


33F37XS/33F37XA 


serogroups 33/37 


328 


33F37XS0/33F37XA12 


serotypes 33F/33A/37 


746 


3S1/3A1 


serotype 3 


321 


3S2/3A2 


serotype 3 


297 



Notes, 

1 . See Table 2 for primer sequences. 

2. For sequencing use only. 

5 



DNA preparation, PCR and sequencing 

DNA extraction, PCR and sequencing were performed as previously described 
(Kong et al., 2002). 

Sequence comparison, multiple sequence alignments, and phylogenetic analysis 

Sequences were compared using Bestfit in Comparison program group. Multiple 
sequence alignments were performed with Pileup and Pretty in Multiple Sequence 
Analysis program group. Phylogenetic relationships were studied using Ednadist and 
Ekitsch in Evolutionary Analysis program group. All programs are provided in 
WebANGIS, ANGIS (Australian National Genomic Information Service), 3"^ version. 

Nucleotide sequence accession numbers 

The new partial sequence data for cpsA-cpsB^ wzy (polymerase) and wzx 
(flippase) genes for selected reference and clinical isolates reported in this paper have 
appeared in the GenBank Nucleotide Sequence Databases, with accession numbers 
AF532632-AF532715, and AF163171-AF163232, respectively (Table 1). 

Previously reported sequence data used in this paper, in addition to those listed 
in Table 2, have appeared in GenBank Nucleotide Sequence Databases with the 
following accession numbras: U15171, U66846 and U66845 {cps gene cluster for 
serotype 3); NC_003028 (serotype 4 genome); AJ239004 {cps gene cluster for 
serotype 8); AF030367-AF030372 {cps gene cluster for serotype 19F); AF105113 
(partial cps gene cluster for serotype 19A); AF105114 and AF106137 (partial cps gene 
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clusters for serotype 19B); AF105115 (partial cps gene clusters for serotype 19C); 
AF030373 and AF030374 {cps gene clustars for serotype 23F). 

RESULTS 

5 Both pairs of S. pneumoniae spedes-specific primers (targeting psaA and 

pneumolysin genes) produced amplioons of the expected size fixwn all reference and 
clinical isolates except six of 179 CIDM isolates, which, on retesting, were optochin 
resistant and therefore excluded from further study as they were not S. pneumoniae. 

The sequencing primers, cpsSllcpsAi, formed amplicons from all but 13 
10 reference and clinical isolates. Of these 13 isolates, 10 (eight belonging to serotypes 
38/25F and two that were nonserotypable) formed amplicons with primer pairs 
cpsSVcpsAl and cgsS3/cpsA2. Three nonserotypable isolates did not form amplicons 
using any of the primer pairs targeting the cpsA-cpsB region, although tiiey had been 
confirmed to be S. pneumoniae using both species-specific PGR. 

15 

Sequence heterogeneit y in the region between the 3 '-end of cdsA and the 5'-end of 
cpsB 

The present inventors sequaiced and analyzed 800 bp fragments of the region 
between the 3 '-end of qpsA (starting at base pair 951) and the 5 '-end of cpsB (see 
20 Figure 2). Representative sequences were deposited into GenBank (see Table 1 for 
accession numbers). There were 424 sites that were identical for all 51 serotypes 
represented among the isolates examined, leaving 376 (47%) heterogeneity sites. 

Intra- and inter-serotvpe/subtvpe heterogeneity 

25 Only single isolates were available for 11 serotypes and the mixed serotype 

9V/14 (see below). Among 40 serotypes, for which multiple isolates were available, 14 
were divided into molecular capsular sequence types, on the basis of major and/or 
stable intra-serotype heterogeneity. Molecular capsular sequence types were named 
according to their conventional serotype (cs) and, generally, the source of the isolate in 

30 which the sequence difference was first identified [-g = Genbank sequCTLce; -c (CIDM); 
-q (Queensland); - ca (Canada); -nz (New Zealand)]. When sequences characteristic of 
two serotypes were present in the cpsA-cpsB region subtype names included both, with 
the CS first (e.g 23F-23A when CS was 23F; 23A-23F when CS was 23A). Seventeen 
sero^es had no intra-serotype heterogeneity and in nine there were minor and/or less 

35 stable variations between isolates and/or between sequences disclosed herein with 
corresponding sequences in GenBank (Table 4, Figure 2). 
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There were 368 heterogeneity sites fliat allowed differentiation between 
molecular capsular sequence types, including both specific and shared sites (Table 4, 
Figure 2), 

5 Phvloeene tic tree based on region of the 3'-end of qp^^-the 5'-end of cpsB genes 

Using these 800bp sequences, a phylogenetic tree was inferred for the 132 
(included the new sequences firom Example 2) S. pneumoniae molecular capsular 
sequence type analysis of the cpsA-cpsB region (Figure 3 - it should be noted that in 
Figure 3 the sequence types were renamed based on serotype and their GenBank 
10 accession numbers). Typical class I serotypes (e.g. 1, 18C, 19F), a typical class 11 
serotype (e.g 33F, represented by 33F-g) and a nontypical class II serotype (19A) were 
each in different clusters of the tree (Jiang et al., 2001). 

The phylogenetic tree provides evidence for, and suggests possible sources of, 
recombination between cpsA-cpsB genes of classes I and IL For example, subtype 23F- 
15 c (or 23F-AF532678) clustered with 15A-c2 (or 15A-AF532647), but in a separate cluster 
from other 23F and 15A subtypes, suggesting that they may have arisen by 
recombination between 23F and 15 A, respectively, and other serotypes. 

Molecxdar caps ular sequence typing based on cpsA-cpsB region sequences 
20 The molecular capsular sequence type, assigned on the basis of cpsA-cpsB 

sequence, was the same as the CS for all isolates belonging to 36 of 51 serotypes (or 
304 of 394 [77%] isolates), and for the majority of isolates (25 of 39) belonging to 
another five serotypes (Table 5). The remaining isolates in these serotypes shared 
sequences with other serotypes, namely 6A with 6B, lOA and 23A with 23F, 15B with 
25 22F and 17F with 35B, presumably as a result of recombination. There were five 
serotype pairs, represented by 46 isolates, whose members had identical sequences: 
namely 20/13, 18C/18B, 38/25F, 31/42 and 33F-g/33A. 
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Molecular capsular sentience typing h^^^^ on PGR targ eti ng wzv and (orf2 fwze]. 
cap3A-cap3B for serotype 3) 

There is significant sequence heterogeneity inwzymdwzx (data not shown), 
which made them suitable PGR targets for serogroup or serotype identification (Tables 
5 2 and 3). With few exceptions, primer pairs targeting these genes formed ampUcons 
only fi:om the corresponding serotypes represented in the five reference panels. 
Exceptions were: PGR targeting serotype 6B also amplified 6A; PGR targeting 18G 
amplified all serotypes in serogroup 18; PGR targeting wzx (but not wzy) of serotype 
23F, amplified three serotype 23A strains; PGR targeting wzx and wzy of serotypes 
10 33/37 amplified a 33A isolate and that targeting wzx amplified a serotype 33B isolate. 

The specificity of serotype 3-specific primers targeting the orf2 (wze)-cap3A- 
cap3B genes (Airecubieta et al., 1996) was confirmed by production of an ampUcon of 
the expected size from all 17 serotype 3 isolates. Thus, a serotype or serogroup was 
assigned by PGR to aU 239 isolates belonging to serotypes/serogroups for which 
15 specific PGR was deyeloped (Table 5). 

Ck)mparison of molemlar capsular sequence typing b ased on cdsA-cdsB seq uencinfT 
and PGR/seqnenci ng targeting wzx and wzv 

The results of PGR and cpsA-cpsB sequencing were consistent except that PGR 
could not distinguish between some members of serogroups 6, 18, 23 and 33/37 and 
further sequencing (of wzx. wzy) was required to identify individual molecular capsular 
sequence types (see below). The cpsA-cpsB sequences of six lOA isolates were 
identical to those of 23F, but the isolates were negative m the 23F-specific PGR 
targeting wzx and wzy (1 0A-23F). 



20 



25 



30 



Relationships within serogroup s 

Sequence analysis of the cpsA-cpsB region and wzy aa.d wzx genes (data not 
shown) showed variable phylogenetic relationships between members of different 
serogroups. 



Serogroup 6 

Serotypes 6A and 6B were divided into five and three subtypes, respectively, 
based on different sequence patterns in the cpsA-cpsB region. Three 6A isolates had 
sequences in this region characteristic of serotype 6B (Table 4). Serotypes 6A and 6B 
35 could not be distinguished by PGR targeting wzx and wzy. Sequencing of these genes 
correctly identified all except one 6A isolates, but some 6A and 6B subtypes share 
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identical or very similar sequences. The serotype of the discrepant isolate (serotype 6A, 
6B-q) was checked independently by two laboratories (Vakevainen et al., 2001). 

Serogrotq) 18 

Serotypes 18C and 18B had identical cspA-cpsB region seqiiences and were 
close to 18A and 18F in the class I cluster (Figure 3). PGR targeting both wzx and WEry 
genes amplified aU four serotypes. Sequences of 18C and 18B were idaitical to each 
other, but different firom those of serotypes 18A and 18F, which were also 
distinguishable from each other. 



Serogroup 23 

Serotypes 23F, 23A (except 23F-23A and 23A-23F) and 23B were separated 
into different clusters based on cpsA-cpsB sequence differences. Serotype 23 A 
(including 23A-23F) was identified on the basis of a positive result with 23F-specific 
15 primers targeting wzx and a negative result with the corresponding wzy PGR. 
Sequencing could differentiate individual serotypes (23A, 23F and 23B) except 23F- 
23A and 23A-23F. Most 23F-c, 23A-23F and 23F-23A have apparently arisen by 
recombination between 23F, 23 A and/or others, producing sequences in the cpsA-cpsB 
regions that are quite different from their parental types. 

20 

Serogro^ps 33 and 37 

Serotypes 33A and 33F-g share identical cpsA-cpsB sequence and that of 33B 
is similar; 37 and 33F-g cluster together, as do 33B and 33F-q (Figure 3). The 33F/37- 
specific wzx PGR ampUfied 37, 33F, 33A and 33B, indicating similarities at that site, 
25 although sequencing showed clear differences between 33B and the others. The 
33F/37-specific wzy PGR amplified 37, 33F and 33A but not 33B. Thus, met 33B was 
identified on the basis of a positive result with 33F/37-specific primers targeting wzx 
and a negative result with the corresponding wzy PGR. 

, 30 Other serogroups 

Despite antigenic similarities that determine their membership of the same 
serogroup, serotypes 9N and 9V appear to be genetically distant, on the basis of 
significant differences between their cpsA-cpsB sequences and the fact that 9V-specific 
PGR did not amplify 9N. 

35 Similarly, met 19F and 19A had quite different cpsA-cpsB region sequences and 

separated into different clusters. 19F-specific PGR did not amplify 19A and vice versa. 
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There were differences between met 19F, 19A, 19B, 19C in wzx and wzy sequences 
(except wzy sequence of 19C was not available in GenBank), but they formed two 
groups - 19F, 19A and 19B, 19C. 

Serotypes 7F and 7C separated into different clusters based on cpsA-cpsB 
5 sequences, as did llA and IIB (Figure 3). Serotypes 15B and 15C had similar cpsA- 
cpsB sequences and clustered together, except for 15B-22F. Serotypes 17F (including 
17F-C and 17F-35B) and 17A were clustered together. Serotype 35F and 35B are 
closely related based on similar cpsA-cpsB sequences. 

10 Mixed culture 

One clinical isolate identified as serotype 9/14 using antisera was positive in 
9V- and 14-specific PGR (targeting both wzx and wzy), but was identified as met 9V by 
sequencing. The isolate was subcultured and 16 individual colonies were rested. All 16 
colonies were positive in both met 9V-specific and negative in both 14-specific PGR 
15 assays and were identified as met 9V by sequencing. The serotype of the original 
isolate was rechecked and the results (mixed serotype 9/14) were as before. It was 
therefore assumed that the original isolate was a mixture, predominantly of serotype 9V 
with a minor component of serotype 14. 

2® Comparison of serotype identification res u lts between molecular cap s ular sequence 
typing and CS 

After CS and molecular c^sular sequence typing had been completed, the 
results were compared. Initial results were discrepant for 29 isolates; repeat serotyping 
and/or correction of clerical errors resolved aU but five discrepancies. Final results 

25 correlated between GS and molecular capsular sequence typing methods for all isolates 
of 38 serotypes (318 isolates), 20 of 25 of another three serotypes and all five 
nonserotypable isolates (total 343 isolates). In addition, there were 46 isolates 
belonging to pairs of serotypes whose members could not be distinguished fi-om each 
other by molecular capsular sequence typing but all were assigned to the pair that 

30 included the serotype to which they had been assigned by CS. These results were 
classified as consistent. 

The five discrepant results were: one isolate of serotype 6A was identified as 
6B-q, two isolates of serotype 15B were identified as 22Fand two isolates of serotype 
17F as 35B. 
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Algorithm for serotype assi gnment of S: pneumoniae bv molecular capsiilar sequence 
typing 

An algorithm for practical use of the molecular capsular sequence typing 
method for the identification of S. pneumoniae serotypes is shown in Table 6. 

5 

DISCUSSION 

Sequences of 16 gene clusters showed fliat all have the same four genes at 
their 5' ends - cpsA (wzg)-cpsB (wzh)-cpsC (md)-cpsD (wze) - which are the sites for 
recombination events that generate new forms of cj^sular polysaccharide. The 
10 sequences for different serotypes can be divided into two classes and show eyidence of 
interesting recombination patterns., 

The study of 51 serotypes, of which 40 were represented by more than one 
isolate, showed that the cpsA-qpsB sequences for the same serotypes were generally 
stable or could be consistently divided into a small number of subtypes. This shows that 
15 sequence patterns in this region can be used to identify different 
serotypes/serosubtypes. 

It has been shown previously that PCR-RFLP based on the cpsA-cpsB region 
can predict S. pneumoniae serotypes (Lawrence et al., 2000). However, the method 
generates a long ampKcon (l.Skbp), requires the use of three restriction enzymes and 
20 special equipmait and has limited discriminatory ability. 

The present inventors identified 376 sequence heterogeneity sites, in the cpsA- 
cpsB region, among the 51 serotypes studied (Table 4, Figure 2), which allowed a 
practical MOT assay based on sequencing to be developed. Several pairs of primers 
were designed to amplify a 1001 bp segment within the cpsA-cpsB region, based on the 
following considerations. The primers formed amplicons ftom virtually all, S. 
pneumoniae isolates (>99% of those examined); the amplicon is small enough to be 
amplified using normal PGR protocols; the region of interest (800bp) can be sequenced 
using a single reaction and the method is objective. The target included most of the 
variable sites (bp 951 to 1747), providing maximum discrimination between closely 
related serotypes (e.g. members of serogroups 33 and 37 that could not be distmguished 
by serotype/group-specific PGR). 
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Some of the 376 heterogeneity sites in the cpsA-cpsB region were specific for 
individual molecular capsular sequence type (Table 4, Figure 2), while others were 
shared between several. Based on these patterns, plus PGR and selective sequencing of 
type-specific regions of wzx and wzy. most of the 51 serotypes represMited among ova: 
5 394 isolates could be distinguished and fiirther divide them into a total of 71 molecular 
capsular sequence types, with the aid of sequence analysis software. The final CS and 
molecular capsular sequence typing results correlated for 343 isolates of 389 (88%) for 
which results for both methods were available, including five that were nontypable by 
either method. For 46 isolates belonging to five serotype pairs, members of which 

10 could not be distinguished by sequencing, results were classified as consistent leaving 
unresolved discrepancies between methods for only five (1 .2%) isolates. 

Sequence analysis of the cps gene clusters of 16 serotypes showed that wzy 
(capsular polysccharide polymerase gene) and wzx (capsular polysccharide flippase 
gene) are highly variable, making them suitable targets for direct serotype identification 

15 by PGR. The present inventors designed serotype-specific PGR primers for these 
serotypes, targeting wzx and wzy and, for serotype 3, which has no wzy and wzx genes, 
targeting OTf2 (wze)-cap3A- cap3B (Arrecubieta et al., 1996). It was found that 
presumed serotype-specific primers for 6A, 18G, 23F and 33F/37 were not serotype- 
spedfic, but amplified otha: related sero^es. To improve the molecular capsular 

20 sequence typing mefliods, portions of the wzy and wzx genes of serolypes within these 
groups were sequenced, which allowed molecular capsular sequence types to be 
distinguished wifliin these serotypes/groups and demonstrate relationships between 
them. 

The present inventors have recognized that the large number of pneumococcal 
25 serotypes would make it impractical to use serotype-specific PGR for all of them. 
Nevertheless, wzy amd wzx PGR can be used to resolve discrepancies between GS and 
cpsA-cpsB region sequencing assays e.g. for molecular capsular sequence types lOA- 
23F and 23A-23F. Moreover, the use of two target regions in the cps gene cluster helps 
to clarify the relationships between most that have apparentiy arisen by recombination. 
30 Serotype/group-specific primers were evaluated using three reference panels, which 
had been characterised by GS and used to identify clinical isolates of unknown cs. By 
PGR alone, 239 (61%) of our 394 clinical isolates were assigned to a serotype or 
serogroup (Table 5). This method can be extended to other met, when additional wzx 
and wzy sequences are available. 
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In some circumstances, sequencing of the cpsA-cpsB region may be more 
practical than type-specific PGR. For most serotypes only a single method and fewer 
primers (cpsSl/qpsA3-formost serotypes/isolates) are needed. 

Previous studies have shown that seroiypes included in 23-valent polysaccharide 
5 and 11 -, 9-, 7-valent protein conjugate vaccines are those most frequently isolated from 
normally sterile sites (CSF, blood) (Cohnan et al., 1998; Huebner'et al., 2000). Among 
173 consecutive pneumococcal "sterile site" isolates from adults in the CIDM 
diagnostic laboratory, over a 2.5-year period, correlation between the met and cs was 
good (171/173 CIDM isolates were correctly identified). The exceptions were two 
10 serotype 15B isolates that were identified as molecular capsular sequence type 22F. 
Five serotypes (4, 14, 19F, 23F, 9V -covered by all pneumococcal vaccines) accounted 
for 57% of isolates. 

Five of 394 isolates studied were nontypable by both CS and molecidar capsular 
sequence typing (Barker et al., 1999). Isolates may be nonserotypable because of 

1 5 decreased type-specific-antigen synthesis, nonencapsulated phase variation or insertion 
or mutation of genes of cps gene .clusters. Failure to type them by molecular capsular 
sequence typing reflects the fact that the sequence database is still incomplete (also the 
reason for the fiirther research in Example 2), although the target regions of two of the 
five nonserotypable isolates have been sequenced. 

20 In summary, the present inventors have developed a molecular capsular 

sequence typing system for S. pneumoniae, which is reproducible, can be performed by 
any laboratory with access to PCR/sequencing and does not require large panels of 
expensive serotype-spedfic antisera. Work on an international collection of isolates in 
our reference panels demonstrated a strong correlation between flie cpsA-cpsB 

25 sequence and CS. Heterogeneity m a relatively short sequence (SOObp) in this region, 
supplemented by serotype/group-specific PGR targeting wzx and wzy. correctly 
predicted the serotype of most unknown isolates belonging to 51 serotypes. These 
novel molecular capsular sequence typing methods provide comprehensive strain 
identification that will be useful for epidemiological studies that will be needed to 

30 monitor serotype distribution and detect serotype switching, if any, among S. 
pneumoniae isolates before and following introduction and widespread use of 
conjugate vaccines. 
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EXAMPLE 2 - Identification of S. pneumoniae serotypes bv analysis of the wspt 
and/or wzy ffenes 

MATERIALS AND METHODS 
Pneumococ cal clinioal isf>1gti»c 

5 This study was based on 92 weU-characterized S. pneumoniae isolates, which 

represented 55 serotypes and including about 31 of 39 serotypes that were not included 
in Example 1. The sources of these isolates were 72 from Oiina Medical Bacteria 
Culture Collection Center, Beijing, PR China; 17 from Royal College of Pathologists of 
Australasia, Quality Assurance Program Pty Limited, New South Wales, AustraHa; 

10 three from Associate Professor Geoff Hogg and Ms Jenny Davis, Microbiological 
Diagnostic Unit (MDU), Pubhc Health Laboratory, Department of Microbiology and 
Immunology, University of Melbourne, Victoria. Conventional serotyping (CS) had 
been performed by donor laboratory and serotypes of the 75 strains were known at time 
of receipt and 23 selected isolates (including all of serotypes 27, 28F and 16A isolates 

15 and two from Example 1 - which had been identified as one each of serotype 42 and 
41F strains each) were re-tested by the Quellung reaction - as described above - at 
Department of Microbiology, Children's Hospital at Westmead (Henrichsen, 1999). 

Isolates were retrieved from storage by subculture on blood agar plates 
(Columbia n agar base supplemented with 5% horse blood) and incubated overnight at 

20 3r'Cin5%C02. 



Annotati on and analysis of wzx and wzy 

Analysis of homology and protein hydrophobicity was performed to annotate 
the wzx and wzy genes in S. pneumoniae cps gene cluster. Blast and PSI-blast (Altschul 
et al., 1997) were used for searching databases including GenBank and Pfem protein 
motif database (Bateman et al., 2002) for possible gene ftinctions. The TMHMM v2.0 
analysis program (Chen et al., 2003) was used to identify potential transmembrane 
segments from the amino acid sequence. Sequence alignment and comparison were 
done using the program ClustalW (Thompson et al., 1994). The phylogenetic trees were 
generated by neighbour-joining method using programme MEGA (Kumar et al. 1994) 
(Figures 4 and 5). 



Oligonucleotide p rimers 

In addition to our previous MCT primers (Example 1) numerous serotype(s)- 
specific oligonucleotide primers, targeting wzy and wzx (one pair), were designed for 
this study. The specificity, sequences, numbered base positions and melting 
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temperatures (Tm) are shown in Table 7. Expected ampUcon lengths of different 
primer pairs can be calculated from the 5*-end positions of the corresponding primers. 

DNA preparation. PGR, seq uencing and sequence analysis 
5 DNA extraction, PGR, sequencing and sequence analysis were perfonned as 

described Example 1. The only exception was that, for the new PGRs, 55-60**G was 
used as annealing temperature because of the low Tin values of the new primers. 

Nucleotide sequence acces sion numhara 
10 56 new sequences generated in this study, for partial cpsA (wzg)-cpsB (wzh) 

genes were deposited in GenBank with accession numbers: AY508586-AY508641. 
These sequences form part of the presait invention. 

RESULTS AND DISCUSSION 

15 Conventional serotvping (CS^ results 

Conventional serotyping, of 23 strains, was repeated because of apparent 
sharing of sequence types between two or more serotypes. After careful repetitions by 
two different persons, a previous serotype 42 isolate was confirmed to be serotype 31 
and a previous serotype 41F isolate to be serotype 41 A (Example 1); serotypes of three 

20 additional isolates were also corrected. The serotypes of the other 15 isolates were 
confirmed to be as previously defined (including all the serotypes 27, 28F and 16A 
isolates, one each of serotypes 6A, 38 and 25F isolate). The final results are shown in 
Table 8. 

25 Partial cdsA-cdsB sequencing p rimers 

The sequencing primers cpsSl-cpsA3 produced amplicons from all strains 
studied in this and our previous study, except for two belonging to rare serotypes, 25F 
and 38, and five that were non-serotypeable (Example 1). Two additional primer pairs, 
cpsSl-cpsAl and cpsS3-cpsA2, formed amplicons fix)m strains belonging to serotypes 

30 25F and 38 and two non-serotypeable isolates. 
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Table 7. OUgonucleotide primers used in this study. 



Name of primers 


Sequence and orientation of oliso-nucleotides 


Positions 


Tin 


1 1 ux>- w zy-sense 


5 '-'lTGAGCTATTTAAGGACCTGGG-3 ' 
(SEQIDNO:144) 


395 


58.4 


1 urn A. ^ 1 11 w vpw * — — J- - M . ^ 

X 1/ A- 1 uiD-wzy-aniisense 


3 '-AGTTCl lliCACTGCGAACGATT-S ' 
(SEQIDNO:145) 


677 


58.4 


■I u v^- i \jr -wzy-scnse 


5'-GTCAATAAGTTTAAGTGTTATAGGGC-3' 
(SEOIDNO:146) 


51 


59.0 


1 vi L^- i ux* "wzy-antiseiise 


3 '-CAAGCGTTGTGGGTAGTGATAT-5 ' 
(SEQIDNO:147) 


337 


63.5 




5 '-GATQGGAAAATACGATATGCTC-3 ' 
(SEOIDNO:148) 


427 


56.1 


1 D * wzy-aiiCiseiis© 


3'-CGACCTCAAAACAGTACCTCAA-5' 
(SEQroNO:149) 


736 


58.5 




5 '-CI 11 ATCAGGAATACGCCAATC-3 ' 
(SEQIDNO;150) 


383 


56.5 




S'-GCAACCAAGAGCAATAATATGTCC-S* 
(SEQIDNO:151) 


683 


58.3 


1 J" wzA-scns© 


5'-Cl l'lTCTTCGTATGCnTAGGG-3' 
(SEQIDNO:152) 


93 


56.3 


1 ^-WZA"aIjIlSCIlS6 


3'-GACTATCCACATTAGAGATAGAAGG-5' 
(SEQIDNO:153) 


460 


53.9 




5'-GTlC"ii-iGTll'GACCCTTCCTT-3' 
(SEQIDNO:154) 


289 


57.2 


^v" VV ^A'-clIUJ.SCllSc 


3 -TATCi l ATGCGGTCTGTCGTAA-5' 
(SEQIDNO:155) 


604 


56.4 




5'-TTGTTCTTACATTTAGCCGTAGTG-3' 
(SEOIDNO:156) 


434 


56.9 


1 or-wzy-antisense 


3'-GACAGTGAGATAGTGAGTCGTTTA-5' 
(SEOIDNO:157) 


777 


55.9 


^ / -wzy-scnse 


5'-CAGAGTTTGGTCGAGGTTCCTA-3' 
(SEOIDNO:158) 


455 


58.7 


/-wzy-anusense 


3'-GAGTTAGTTGCTGCCTTTAGTG-5' 
(SEOIDNO:159) 


782 


59.7 


28F- 1 6A- wzy-seose 


5'-GATCCGCTCACGGTATGGACTA-3' 
(SEOIDNO:160) 


261 


61.6 


28F- 1 6 A-wzy-antisense 


3 '-GAATAACCGACTGTCGTnrTA A_^» 
(SEQIDNO:161) 


Sol 


57.1 


16F-wzx-sense 


5'-TTTATGAGGAGAGTACTGTATCAGA-3 ' 
(SEQIDNO:162) 


1219 


53.1 


1 6F-W2x-antisense 


3'-ACTCAAGCTATCGATAGTAATTTGT-5' 
(SEQIDNO:163) 


1433 


56.6 


27-wzx-seiise 


5'-TACATn'lTATGAGAAGAGCATTG-3' 
(SEQIDNO:164) 


1213 


54.6 


27-wzx-antiseiise 


3'-GCTATCAGTACTATTTTrTTCTCAC-5' 
(SEQIDNO:165) 


1439 


56.4 


33A-specific-sense 


5'-TTGTTGTTGGGATTGTCTTGGG-3 * 
(SEOIDNO:166) 


length 


62.1 
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33A-specific-antisense 


3*-G'lll'CAAGGCTrTAGGTTTCCG-5' 
(SBOIDNO:167) 


246bp 


62.9 


9V-specific-sense 


5'. TCTTTGATTTCATCAGGGATTG-3' 
(SEOIDNO:168) 


length 


57.0 


9V-specific-antisense 


3'.ATCACCATTGACGCAATCAGGA.5^ 
(SEOIDNO:169) 


545bp 


54.2 


15A-15B-15C-wzx-sense 


5'-ATTGCGACTGTTAAACGAGAAG-3' 
(SEQIDNO:170) 


202 


57.0 


15A-15B-15C-WZX- 


3'-CCGTGTCTAAATACCTTTATGT-5' 
(SEQ ID NO:171) 


514 


55.0 


1 5B- 1 5C-wzy-sense 


5'-TAATAAGCGGATGATTGTAGCG-3' 
(SEQ ID NO: 172) 


693 


58.1 


1 5B-1 5C-wzy-antisense 


S'-GGGTAGACCTTTCAATTAGTCA-S' 
(SEQ ID NO: 173) 


1041 


55.5 


1 5 A-wzy-sense 


5'-TATTTCCTTCCTATGGGACAAC-3 ' 
(SEQ ID NO: 174) 


840 


55.6 


1 5 A-wzy-antisense 


3'-CACCACTACTAATCGTAATAACA-5' 
(SEQ ID NO: 175) 


1100 


54.2 


22F-22A-wzy-sense 


5'-AGGATGCAGTAGATACCAGTGG-3' 
(SEQ ID NO: 176) 


398 


56.1 


22F-22A-wzy-antisense 


3'-CCTGTTGTTGGAGGCAAATATC-5* 
(SEQ ID NO: 177) 


752 


56.2 


22F-22A-W2x.sense 


5'-GGTTCTATCAAGGAAAAGAGGAC-3' 
(SEQ ID NO: 178) 


404 


56.3 


22F-22A-wzx-antisense 


3'-CAACCCAAGTCACTAACGATAA-5' 
(SEQ ID NO: 179) 


672 


56.3 


1 1 A-specific-sense 


5'-CACTTCCATATCCAGCAT-3' 
(SEQ ID NO: 180) 


727-744 


47.5 


1 lA-specific-antisens6 


3'-GACAGAGGACTATCAAGAGT-5' 
(SEQIDNO:181) 


970-989 


46.4 


7A-wzy-specific-sense 


5'-GCAAGTGTTTCAATGGGAGTA-3' 
(SEQ ID NO: 182) 


76 


55.3 


7A-wzy-specific-antisense 


3'-GAATAACATACCAGGGAGGCA-5' 
(SEQ ID NO: 183) 


420 


56.1 


7A-wzx-specific-sense 


5'-l 1 rCjAGAATGCGGATAAGGTG-3' 
(SEQ ID NO: 184) 


730 


58.0 


7A-wzx-specific-antisense 


3 '-GAGTAACATTGTCCCGTTTGAA-5 * 
(SEQ ID NO: 185) 


1060 


56.7 


1 1 A-1 ID-wzy-specific- 
scnsc 


S'-CGAAATATCGCCATTCATCAG-S' 
(SEQ ID NO: 186) 


190 


58.4 


1 1 A-1 ID-wzy-specific- 
antisense 


3'-TCACCGTGTCAACGACAACrAA-5' 
(SEQ ID NO: 187) 


570 


59.8 


1 1 A-1 ID-wzx-specific- 

sense 


5'-CAATCAATAATGCCGCATAC-3' 
(SEQ ID NO: 188) 


856 


54.3 


1 1 A-1 ID-wzx-specific- 
antisense 


3 '-CTAAAGCAATCAAAGGTGTCCA-5' 
(SEQIDNO:189) 


1140 


55.6 


12B-wzy-specific-sense 


5 '-TGGAGGAGCAACTGACGTATT-3' 
(SEQIDNO:190) 


518 


57.3 


1 2B-wzy-specific- 
antisense 


3 •-GAGAAC1TATACCTGCCACCT-5 ' 
(SEQIDNO:191) 


783 


57.5 



wo 2004/090159 



PCT/AU2004/000480 



71 



1 2B-wzx-specific-sense 


5'-GTATGTTATTCGTTAGACAAACTGG.3* 
(SEQIDNO:192) 


1058 


55.6 


12B-W2x-specific- 
antisense 


3'-GACATCCAAATACATAACGCTCAA-5' 
(SEOIDNO:193) 


1363 


56.0 


1 7F-w2y-specific-sense 


5'-CTATTTACCTTGTTTCCTGCAAC-3' 
(SEQIDNO:194) 


490 


56.1 


1 7F-w2y-specific-antiseose 


3'-CTATTGCGATACAGTCGTTAAG-5' 
(SEQIDNO:195) 


838 


54.9 


1 lb -W2x-specific-sense 


5'-GGATTACAAGAAATTCCCTCG-3' 
(SEQIDNO:196) 


722 


56.0 


1 lb -wzx-specific-antisense 


3*.TCCACTATACGCCTCGGTTAT-5' 
(SEQIDNO:197) 


1094 


59.8 


471*-w2y-specific-sease 


5*-i HciGGTCTCCnTACCTATC-3' 
(SEQIDNO:198) 


725 


53.2 


47F-wzy-specific-antisense 


3 ' -CACTACTTCTCAATCCCCTTT-5 ' 
(SEQIDNO:199) 


1195 


53.7 


25A-29-wzy-specific-sense 


5'.CCGAAAATTGTTCACAGGATAC-3' 
(SEQIDNO:200) 


112 


56.8 


25A-29-wzy-specific- 
antisense 


3'-CTATACGGAACATAGGTAGTTAG-5' 
(SEQIDNO:201) 


474 


55.9 


47F-wzx-specific-sense 


5'-AGCAGCAATTGTTTCTGTCTTAACA-3' 
(SEOIDNO:202) 


1128 


60.6 


4 /i:* -wzx-specific-antisense 


3'-GAGAl 1 ircACTATCTACACTATCTT-5' 
(SEQ ID NO:203) 


1389 


52.8 


25A-29-wzx-specific-sense 


5'-CTCCCTATCATTACTACTCCCTATG-3' 
(SEQ ID NO:204) 


58 


56.2 


25A-29-wzx-specific- 
antisense 


3'-AATCCACGCTGTCAAGAAAGTG-5' 
(SEOIDNO:205) 


274 


57.4 


lOC-lOF-wzy-specific- 
sense 


5'.GTCAATAAGTTTAAGTGTTATAGGGC-3' 
(SEQ ID NO:206) 


51 


56.2 


1 OC- 1 OF-wzy-spedfic- 

antisense 


3'.CAAGCGTTGTGGGTAGTGATAT-5' 
(SEQIDNO:207) 


337 


57.8 


7C-wzy-sense * 


5'.ACTCAAGTATCTGTGC/TCACCTT-3' 
(SEOIDNO:208) 


453 


55.7 


7C-wzy-antisense 


3'-CCTCGTCCATCTCCTTCACTAA-5' 
(SEQ ID NO:209) 


703 


57,1 


7C-wzx-sense 


5'-TGAGTTrCCGATTAGAGCAG-3' 
(SEOIDNO:210) 


317 


53.0 


7C-wzx-antisense 


3'-CCTTTACTACGCCATCCATA.5' 
(SEQIDNO:211) 


740 


54.4 


9L-9N-wzy-sense 


5'.TCAATGGCGACrTTATTTGC-3' 
(SEO ID NO-212^ 


72 


55.0 


9L-9N-wzy-antisense 


3'-CGTGGGATGTCCTCTATTATCTGA-5' 
(SEQIDNO:213) 


434 


56.2 


9L-9N-wzx-sense 


5'-GTACCGCAAGCTATTCTAATGA-3' 
(SEO]r>NO:214) 


388 


54.9 


9L-9N-wzx-antisense 


3'-GTCATTCTATCCGCTTCAAATAG-5' 
(SEQIDNO:215) 


853 


53.4 


17A-wzy-sense 


5'-TAGACTTCTTAGAGCCTATTGTGG-3' 
(SEQIDNO:216) 


722 


55.3 
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1 7A-wzy-aatisense 


3 '-CTGGTTATCGCGTTTGACAATA-5' 
loiiU ID NO:217) 


1040 


56.9 


17A-w2x-sense 


5'-CAAACCCTTAGTCCAATATGGCTG-3' 
KpMKl Wj JN0:218) 


624 


62.2 


1 7A-wzx-antisense 


3'.CCGATGGATAATAAGGGAAGCAAC-5' 
(,s>liy ID jnO:219) 


988 


61.0 


23A-wzy~sense 


5'-CATTTGGTATGGGAGTAGGGAG-3' 

V.o±Sv^ Wj JNU:220) 


1049 


58.1 


23A-W2y-antisense 


3'-GTGAAAGAGGATTGAGTACGTGG-5' 
KpcKl ID NO:221) 


1326 


58.5 


33B-48-wzy-sense 


5'-TAATCAA/GTGGTCTGGTGGTCA/GA-3' 
(SEQ ID NO:222) 


453 


57.9 


33B-48-wzy-antisense 


3'-GAAAC/rAAT/CGAGGATAACT/CGACT-5' 
(SEQ ID NO:223) 


815 


57.2 


23F-wzy-sense 


5 '-TGTCAGCAGAAAATATGACGC-3 ' 
(SEQ ID NO:224^ 


402 


56.4 


23F-W2y-antisense 


3 '-CCTTTATGCTGCTrCCCAATAC-5' 
(SEQ ID NO:225) 


766 


58.4 


34-wzy-sease 


5'-TTGTTGTAGTGGCAGTrGCrcC-3' 
(SEQ ID NO:226) 


740 


60.4 


34-wzy-aiitisense 


3'-CGGATGTCCCTTACAGAAATGTTG-5' 
(SEQ ID NO:227) 


1070 


59.4 


35A-W2y-sense 


5'-TCCTGATTATG/ATTGAGATTTG/CG-3' 
(SEQ ID NO:228) 


399 


54.7 


35A-w2y-antisense 


3 '-GACCTAACGCTTCTGAATOAAT-5' 
(SEQ ID NO:229) 


747 


54.8 


36-wzy-sense 


5'-CAATTTCCCCTTATrCTGTAGTTC-3' 
(SEQ ID NO:230) 


692 


56.8 


36-wzy-antisense 


3'-CTCTCTTGTCATATrTGTCCCAGTT.5' 
(SEQ ID NO:23 1) 


1026 


57.0 


39(l)-wzy-sense 


5'-GATTGGTTTGGGAACTTGATGTC-3' 
(SEQ ID NO:232) 


232 


60.2 


39-wzy-antisense 


3'-CACCATACTCCATAGTAAATCGTCC-5' 
(SEQ ID NO:233) 


518 


59.5 


41A-wzy-sense 


5'-GTAGTTACTGGCCCTTTCTTATrCC-3' 
to±i(J ID NO:234) 


511 


59.7 


4 1 A-wzy-antisense 


3'-GTTCTACGTCTATCAAAGAGCGAT-5' 
(SEQ ID NO:235) 


828 


59.0 


41A-w2x-seiise 


5'-CAGCAAATGCAGGTTCTCAAA-3' 
(SEQ ID NO:236) 


278 


59.0 


4 1 A-wzx-antisense 


3 '-ACTGTGGAGCAGATCGTATAGTAAT-5' 
(SEQ ID NO:237) 


566 


58.9 


43-w2y-sense 


5'-GATCAAATGGTGGTATTAGGAA-3 ' 
(SEQIDNO:238) 


251 


54.0 


43-wzy-antiseiise 


3'-CGGTCAGTATAAAAGGTTAAGA-5' 
(SEQIDNO:239) 


601 


55.8 


43-wzx-sense 


5'-TTCTTATCGCTTCCATTGTCAG-3' 
(SEQIDNO:240) 


907 


57.5 


43-wzx-aiitisense 


3'-CCACATTCACCTCGTCGTAAA-5' 
(SEQIDNO:241) 


1182 


57.1 
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47A-w2y-sense 


5 '-TATTTGCCATAACGGACrcTAGAAC-3' 
(SE0IDNO:242) 


485 


59.5 


47A-wzy-antisense 


3'-CACCAATACACCCAAATTAAGAAGC-5' 
(SEOIDNO:243) 


830 


61.5 


47A-w2x-sense 


5 ' - Hi uGGCTCTTTAGGTAGTGTAT-3 ' 
(SE0IDNO:244) 


687 


55.4 


47A-wzx-antisense 


3'-CTGCCTATrACAAGCTATCAAATG-5' 
(SEOIDNO:245) 


1064 


55.3 


48-wzy-sense 


5'-CATTTGGAGTTATTGCCCTAC-3' 
(SEQIDNO:246) 


602 


54.5 


48-wzy-aiitiseiise 


3'-CCCCAGAATTAAATCTTATACX;C-5' 
(SEQIDNO:247) 


909 


56.6 


48-wzx-sense 


5'-AGGGCrrTAACTGTTTCAGTGTT-3' 
(SEQ ID NO:248) 


782 


55.5 


48-wzx-antiseiise 


3'-CTAAACCATATCGTCCTGACTT-5' 
(SEOIDNO:249) 


1113 


54.2 


33C-W2y-sense 


5'-TTATCTATATGTTAGGGCTG-3' 
(SEOIDNO:250) 


197 


45.3 


33C-wzy-antisense 


3'-CTGTGAAGACTrACAACATG-5' 
(SEQIDNO:25n 


445 


43.7 


23B-wzy-sense 


5'-TTGGATCGTTGTTCATAGCGG-3' 
(SEQ ID NO:252) 


639 


61.0 


23B-wzy-antisense 


3'-GACACCTTTACGGCAACGATTC-5' 
(SEQIDNO:253) 


947 


62.5 


23B-wzx-sense 


5'-AGCGAGCGGTATCATTCTATTTG-3 ' 
(SEQ ID NO:254) 


897 


60.8 


23B-wzx-antisense 


3 '-CTATCACAACTrCnTAACGAGGTC-5' 
(SEQ ID NO:255) 


1219 


59.6 


24B-wzy-sense 


5 '-TCAACACTTATGATGGTGCCTG-3 ' 
(SEOIDNO:256) 


685 


58.5 


24B-wzy-antiseiise 


3'-ATCTTCACCCTAATAGCCCGA-5' 
(SEQIDNO:257) 


1025 


58.3 


25F-38-wzy-sense 


5'-AATCTGAGGAAACTTGGAGCAA.3' 
(SEQrDNO:258) 


641 


58.5 


2SF-38-W2y-aiitisens6 


3'-GCATAATTGCTAATCTrAACAAGG-5' 
(SEOIDNO:259) 


977 


55.8 


25F-38-wzx-sense 


5*-GCAATGGTTTATGGATGATAGAGCG-3 ' 
(SEQ ID NO:260) 


702 


64.3 


25F-38-wzx-antisense 


3'-TGTGCTGCTAACGACCACGAAA-5' 
(SEQ ED NO:261) 


1088 


64.4 


31-w2y-sense 


5'-TGAAAATCCCTTAGTGACATCTG-3' 
CSEO ID T^r^•5fi9^ 


492 


56.5 


3 1-wzy-antisense 


3'-GACCAGCATCGTAAAGAGTCTA-5' 
(SEQIDNO:263) 


794 


56.5 


32A-32F-wzy-sense 


5'-CGGTATGCTTACAATGAGACGC-3* 
(SEQIDNO:264) 


813 


60.2 


32A-32F-wzy-aiitisense 


3 '-GTAGAATAGGCCCTTGCTTAAG-S' 
(SEQIDNO:265) 


1163 


60.5 


32A-32F-w2x-saise 


5'-GTAACGATGCCTAGAATGACTT-3 ' 
(SEOIDNO:266) 


799 


53.6 
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32A-32F-wzx-aiitisense 


3 *-CACACCAlTATCCACGACAATAG-5' 
\pt\j ID NO:267) 


1107 


53.9 


35B-W2y-sense 


5'-CTAATTTGGCTATGAAGCTAATCCC-3' 
(i>liy Jjj NO:268) 


626 


60.6 


35B-wzy-antisense 


3 '-CAAATGACTGACGCTGAAATCACTr-5' 
(.oJciy lU NO:2o9) 


1019 


58.2 


45-W2y-sense 


5'-CTATGCAGGAAATATCCGAGAAGG-3' 
(.aliy JJJ NO:270) 


111 


61.7 


45-w2y-antisense 


3 '-GTATCGCAAAGACAAAGTGCCTAG-5' 
(&JbQ ID NO:271) 


497 


63.0 


45-wzx-sense 


5'-AATGGCTTGCTCCTATTGCTGT-3' 
(SEQ ID NO:272) 


929 


60.9 


45-wzx-antisense 


3'-CGTTTAGCAAGAACCCTATCATC-5' 
(SBQ ID NO:273) 


1306 


58.1 


41F-wzx-sense 


5 '-GTCAAAGACAGGAATGACATCTATG-3 ' 
(SEQ ID NO:274) 


493 


57.7 


4 1 F- wzx-antisense 


3'-CCCTCCTTCACGAAAATAAAGA-5' 
(SEQ ID NO:275) 


972 


56.9 


1 8A-1 8-B-l 8C-18F-WZX. 


5'-GGAATCGGACAATAGCAC-3' 
(SEQ ID NO:276) 


35 


50.2 


1 8A-1 8-B- 1 8C-1 8F-WZX- 


3'-ACCAGAACTTCTCAAAGCAT-5' 
(SEQ ID NO:277) 


265 


50.5 


19B-19C-wzx-sense 


5 '-GGCATCAAAGGTTAAGTG-3' 

^OT7/'X TTPv ^%^vj-vv 

(bEQ ID NO:278) 


744 


48.0 


1 9B- 1 9C-wzx-antisense 


3'-GAAGACAGCGTTGAGAAA-5' 

(SEQ ID NO:279) 


1171 


47.5 


19F-wzx-se3Qse 


5'-GCTATCTAACATTGCGAGTA-3' 
(SEQ ID NO:280) 


672 


48.4 


1 9F- wzx-antisence 


3'-AAACCGAAGGACGAATAT-5' 
(SEQIDNO:281) 


967 


49.1 


2-w2x-sense 


5'-TAGCGGTGAATGGCATCT-3' 
(S>EQ ID NO:282) 


644 


54.1 


2-wzx-aiitisense 


3'-AGTTGGAATCATCCTCGCT-5' 

^ C^T7^X '1 1 X X T^~v #^r\MV 

(SEQ ID NO:283.) 


1012 


50.6 


23A-23F-wzx-seiise 


5'-GGGAAATGGTTTACTATCfC-3' 
(SEQ ID NO:284) 


623 


49.7 


23A-23F-wzx-aiitisense 


3'-GTTCTTCTATTCrcGCC(T)A-5' 
(SEQ ID NO:285) 


843 


47.0 


6A-6B-w2x-sense 


5'-ATTTATGAAGGGAAGATGG-3' 
(SEQ ID NO:286) 


1003 


49.0 


6A-6B-wzx-aiitisense 


3'-CCGAGCGTCATTATCAAA-5' 


1324 


47.6 


IJ-wzx-sense 


5'-TATGTnCAAGGGTTCTG-3' 
(SEQIDNO:288') 


88 


45.2 


8-wzx-antisense 


3'-CCTTACCGTCGAATAATA-5' 
(SEQIDNO:289) 


356 


47.4 


9A-9V-wzx-sense 


5'-TGATAAGGCTTACCAGTT-3' 
(SEQIDNO:290) 


732 


44.6 


9A-9V-w2x-antisense 


3'-CTGACCATAACCCTGATT-5' 
(SEQIDNO:29n 


1360 


44.0 
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12F-12B-44-46-wzy-s^e~ 


5 '-TGAATATGGACGGTGGAG-3' 
(SEQ ID NO:292) 


767 


51.1 


12F-12B-44-46-wzy- 

oiiiiseiisc 


3 '-GAAAGCCGAAAGAAACGA-5' 
(SEQ ID NO:293) 


1008 


53.1 


14-W2y-sense 


5'-GATTGGCTGTTCAAGTGT.3* 
(SEQIDNO:294) 


230 


47.3 


14-wzy-antisease 


3 '-CCCTGCCTAAATGTAATC-5' 
(SEQ ID NO:295) 


463 


47.2 


16F-wzy-sense 


5'-TTGTTCTTACATTTAGCCGT-3' 
(SEOIDNO:296) 


434 


50.6 


1 6F-wzy-antisense 


3 '-CCCTGAACCTAAACCATT-5' 
(SEQ ID NO:297) 


737 


49.9 


1 8A-1 8-B-l 8C-1 8F-wzy- 
sense 


5 '-CATGAAGTTGCACCTATT-3 ' 
(SEQ ID NO:298) 


409 


45.2 


18A-18-B-18C-18F-wzy- 

anusense 


3 '-CCCTATCCCAAACATTGT-5' 
(SEQIDNO:299) 


840 


47.2 


19F-wzy-sense 


5 '-AAACGGAAAGTTGGATGG-3' 
(SEQ ID NO:300) 


667 


52.8 


1 9F-w2y-antisense 


3*-CAGAAACGACATCCACGAA-5' 
(SEQIDNO:301) 


1075 


49.9 


2-3 -wzy-sense 


5'-TGTCGGCATTGTATTCTTTA-3' 
(SEQ ID NO:302) 


59 


51.9 


2-3-w2y-antisense 


3'-CCCAGTCCTAAACCACCA-5' 
(SEQ ID NO:303) 


855 


54.4 


37-33F-33A-wzy-sense 


5'-TAGGGAAATGGGCGACTC-3' 
(SEQ ID NO:304) 


101 


55.4 


37-33F-33A-wzy-antisense 


3'-ACCTCAAACCATAACTCGGA-5' 
(SEQ ID NO:305) 


596 


54.7 


6A-6B-w2y-sense 


5'-ATTCCAGCGACTACACTT-3' 
(SEQ ED NO:306) 


496 


46.7 


6A-6B-wzy-aiitisense 


3'-AATCACCACCATCTAACG-5' 
(SEQ ID NO:307) 


634 


45.2 


8-wzy-sense 


5'-CACGCAGACTAGAACAGC-3' 
(SEQIDNO:308) 


606 


48.5 


8-w2y-antisense 


3 '-GAACCAGATACATACGCCA-5 ' 
(SEQIDNO:309) 


1055 


50.5 


9A-9V-wzy-sense 


5'-GTTGGTTTCGACTCTTTG-3' 
(SEQIDNO:310) 


394 


47.5 


9A-9V-w2y-aiitisense 


3'-TnTGCGATGACTGTrAC-5' 
(SEQrDNO:311) 


1017 


45.7 


19B-19C-w2y-sense 


5'-TTCGGAGATTTGTCGTAT-3' 
(SEQIDNO:312') 


478 


47.5 


1 9B- 1 9C- wzy-antisrase 


3 '-AGCAAATACCTCCACCTA-5 ' 
(SEQIDNO:313) 


772 


50.0 


l-wzx-sense 


5'-TGGAGAATTTGCGATTACG-3' 
(SEQIDNO:314) 


744 


54.5 


1-wzx-antisense 


3'-TAGAGTCCCATTTGTCTCAC-5' 
(SEQIDNO:315) 


886 


48.6 


4-wzx-sense 


5'-AATGCnTGTACTACTCCCTC-3' 
(SEQIDNO:316) 


88 


48.5 
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4-wzx-antisense 


3 '-GATACTAAATGCCTACCG-5 ' 
(SEOroNO:317) 


898 


48.1 


19A-wzx-sense 


5 '-TTCCCTATGTCAGTCTATGAA-3 ' 
(SEQIDNO:318) 


1000 


49.7 


1 9 A- wzx-antisense 


3'-TCTTCATAGTATCGGCTTAA-5' 
(SEQIDNO:319) 


1214 


48.8 


1-wzy-sense 


5 '-TATTCTATITCTTACCCGCTAC-S' 
(SEOIDNO:320) 


211 


51.6 


1-wzy-antisense 


3'-ATTCACCCGTTCAAAGTAGA-5' 
(SEQIDNO:321) 


801 


52.4 


4-w2y-sense 


5'-GTGCCTAGTAGCATTCCATA-3' 
(SEQIDNO:322) 


1003 


50.5 


4-wzy-antisense 


3'-GAAACCAATGATACCACCAC-5' 
(SEQIDNO:323) 


1198 


50.4 


1 9A-wzy-sense 


5'-TCGCCTAGTCTAAATACCAA-3' 
(SEOIDNO:324) 


235 


50.7 


1 9A-wzy-aiijtisense 


3 '-AAGTGAATCTTAAAGCCGTC-5' 
(SEOIDNO:325) 


975 


53.4 


17F-YS2-se!iise 


5'-AGAGGGATTGTTGAAGGTATTC-3' 
(SEQIDNO:326) 


754 


59.8 


17F-YA2-antisease 


3'-CCTACTATCTTTACGCTCTGAT-5' 
(SEOIDNO:327) 


1060 


59.7 


25F-38-YS-seiise 


5'-GGCGTTGTCAGTGCTAGTTTAG-3' 
(SEOIDNO:328) 


121 


62.6 


25F-38-YA-antiseiise 


3'-CTCATATTACX:GACGAAATTGTCC-5' 


713 


61.6 


35F-47F-YS-sense 


5 '-ATAAAAAGAAAGTCTTTGCCAGAG-3 ' 
(SEOIDNO:330) 


13 


60.6 


35F-47F-YA-aiitisense 


3'-CTACTACTTGTATCAGCGATAAC-5' 
(SEQIDNO:331) 


499 


60.0 


25A-29-YS-sense 


5'-CCGAAAA1TGTTCACAGGATAC-3 ' 
(SEQ ID NO:332) 


112 


62.0 


25A-29-YA-antiseiise 


3'-CTATACGGAACATAGGTAGTTAG-5' 
(SEQ ID N0:333) 


474 


60.9 



Updated sequence type nomen clature rcompared with Hxamule n 

Sequence types were genraally named according to the corresponding serotype, 
5 with a suffix representing the source of the isolate for which the sequence type was first 
idaitified. When sequences characteristic of two to five serotypes were identified, the 
sequence type name included aU, with the lower number serotype first (e.g 15B-15C- 
22F-22A etc.) (Henrichsen, 1995). Representative sequences of all sequence types were 
deposited into GenBank (see Table 8 for sequence type nomenclature and 
1 0 corresponding GenBank accession numbers). 
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Are the shared sequence types plausible? 

In order to explain the many shared sequence types, we studied their antigenic 
formula (Henrichsen, 1995). Among the 31 shared sequence types (Table 9), six were 
shared between unrelated serotypes (2-41A, 10A-17A, 10A-23F, 13-20, 25A-29, 33D- 
48), three were shared between two to three related and at least another unrelated 
serotype (7B-40, 11A-11D-18F. 27-28F-28A, 17F-35B-35C-42) and 20 were shared 
between antigenically related sero^es. The remaining shared sequence type involved 
serotypes 16A and 28F; although they are not directly related, 28F is related to 
serogroup 16 (Table 9) (Henrichsen, 1995). Thus most shared molecular c^sular or 
sequence types (genotypes) involve closely related serotypes (or phenotypes). The 10 
shared sequence types that involve unrelated or more distanfly related (such as 16A- 
28F) serotypes probably can be explained by recombination events between serotypes. 

Are wzx and wzv helpful? 

In Example 1 it was shown that wzy and wzx based PCRs increase the accviracy 
of cpsA-cpsB sequence-based serotype prediction. Thus, in order to extend our 
serotype-prediction strategy to all 90 serotypes, we examined the wzx and wzy 
sequences of the 90 serotypes, especially the 31 shared sequence types (Tables 7 and 
9). In addition to the sequaices we have determined, the imannotated sequences from 
the cps gene clusters of all 90 serotypes as detemained by the Sanger Institute was used 
to determine the 90 wzx md wzy sequences. The idaitical of suitable serotype-spedfic 
wzx and wzy based primers was fer from straightforward. For most of the 90 serotypes, 
wzy is shorter but more heterogeneous than wzx and therefore a more suitable single 
target for serotype-spedfic PGR. The wzy sequencing results showed that it would be 
helpful for the discrimination of 7C-40, lOF-lOC, 12A/46 (idaitical)- 12F/12B/44 
(identical), 35A-35C/42 (identical), 35F-47F serotype(s) pairs. 

It is shown that wzx genes from 28 different serotypes share high-level 
homology (72% to 100%). We fo\ind three main recombination sites in these 28 wzx 
(base positions 395, 775 and 1150) using the programme PhylPro 1.0 (Weiller 1998), 
which generated the diagrammatic representation of polymorphic sites and hypothetical 
recombination events of the wzx gene shown in Figure 6. 
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Compreh ensive molecular capsvilar sequence typing results 

The final molecular capsular sequence typing results for 519 isolates (427 
previously studied and 92 new isolates) are shown in Table 9. Our database now 
includes 90 S. pneumoniae serotypes and 134 sequence types (including two non- 
5 serotypeable strains). 83 serotypes are represented by 2 or more strains. 102 sequence 
types (not including two nonserotypeable strains), including 47 that are represented by 
two or more isolates, correspond to a single serotype; 23 sequence types are shared by 
two serotypes, six are shared by Utree serotypes and two are shared by four serotypes 
(Table 8). 

10 

It will be appreciated by persons skilled in the art that numerous variations 
and/or modifications may be made to fbe invention as shown in the specific 
embodiments without departing from the spirit or scope of the invention as broadly 
15 described. The present embodiments are, therefore, to be considered in all respects as 
illustrative and not restrictive. 

All publications discussed above are incorporated herein in their entirety. 
Any discussion of documents, acts, materials, devices, articles or the like which 
has been included in the present specification is solely for the purpose of providing a 
20 context for tiie present invention. It is not to be taken as an admission fliat any or all of 
these matters form part of the prior art base or were common general knowledge in the 
field relevant to the present invention as it existed before the priority date of each claim 
of this application. 
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